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Abstract 


This  report  summarizes  our  research  for  the  year  Jan.  05  to  Dec.  05.  We  have  devel¬ 
oped  multi-category  classifiers  based  on  seismic  data  to  classify  heavy-tracked,  light-tracked, 
heavy- wheeled  and  light-wheeled  ground  vehicles.  We  focused  on  data  collected  in  the  normal 
terrain. 

We  also  developed  fusion  algorithms  for  type-1  and  type-2  Fuzzy  logic  Rule-Based  Clas¬ 
sifiers  (FL-RBCs)  based  on  the  Choquet  Fuzzy  Integral  (CFI).  We  conducted  experiments 
to  evaluate  the  performance  of  the  classifiers  and  to  evaluate  the  effectiveness  of  seismic 
data  for  classification.  We  also  conducted  experiments  to  evaluate  the  performances  of  fused 
classifiers  (both  seismic  and  acoustic)  and  determine  if  performance  could  be  improved.  Our 
results  show  that  binary  classification  between  tracked  and  wheeled  vehicles  is  effective  using 
seismic  data.  However,  due  to  the  inherent  unreliability  of  the  seismic  data,  the  performance 
of  the  classifiers  based  on  seismic  data  was  poor  when  compared  to  the  performance  of  the 
classifiers  based  on  acoustic  data.  Fusing  the  two  classifiers  also  did  not  show  any  appreciable 
improvement  in  performance. 

We  note  that  FL-RBCs  performed  better  than  the  Bayesian  equivalent  for  all  the  exper¬ 
iments.  This  shows  that  FL-RBCs  are  better  suited  to  handle  uncertainties  in  the  data. 


Chapter  1 
Introduction 


The  emissions  of  ground  vehicles  contain  a  wealth  of  information  which  can  be  used  for 
vehicle  classification,  e.g.,  in  the  battlefield.  The  model  for  the  emissions  can  be  simplified 
to  be  the  addition  of  periodic  components  and  noise.  The  former  accounts  for  the  periodic 
movements  in  the  engine,  and  the  latter  accounts  for  the  propulsion  process  in  the  engine 
and  the  interactions  between  the  vehicle  and  the  roads.  Because  the  operating  mechanisms 
are  different  for  different  vehicles,  it  is  possible  to  distinguish  among  heavy-tracked,  light- 
tracked,  heavy-wheeled  and  light-wheeled  vehicles;  however,  uncertainties  that  arise  due  to 
the  non-stationary  nature  of  the  data,  different  speeds  and  data  corruption  make  this  a 
challenging  task. 

In  previous  yearsfll],  a  complete  Type-2  FLS  theory  was  established  with  IF-THEN 
rules  and  was  used  to  design  and  implement  Fuzzy  Logic  Rule-Based  Classifiers  (FL-RBCs) 
to  classify  ground  vehicles  based  only  on  their  acoustic  signatures.  Binary  and  multi¬ 
category  classifiers  were  developed  for  the  normal  terrain  using  the  Acoustic-Seismic  Classi¬ 
fication/Identification  Data  Set  (ACIDS).  Some  of  salient  items  learnt  from  previous  years 
are: 


•  The  non-hierarchical  T2  FL-RBC  performed  the  best  among  different  classifier  archi¬ 
tectures  and  has  the  least  complexity  of  the  FL-RBCs.  We  used  this  architecture 
during  this  year’s  study. 

•  Adaptive  majority  voting  makes  a  significant  improvement  in  the  performance  of  clas¬ 
sifiers  based  on  acoustic  data.  We  implemented  this  scheme  for  this  year’s  study  as 
well. 

This  year,  we  developed  multi-category  FL-RBCs  for  classifying  heavy-tracked,  heavy-wheeled, 
light-tracked  and  light-wheeled  ground  vehicles  for  the  normal  terrain  using  seismic  data. 
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We  also  developed  the  Bayesian  classifier  to  baseline  the  performance  of  the  FL-RBCs.  We 
investigated  performance  gains  obtained  by  employing  a  decision  fusion  technique  called  the 
Choquet  Fuzzy  Integral  (CFI)  to  fuse  classifier  outputs.  > 

This  report  summarizes  our  research  during  this  year  and  is  organized  as  follows: 

•  Data  pre-processing  which  consists  of  data  analysis,  prototype  extraction,  feature  ex¬ 
traction  is  briefly  reviewed  in  Chapter  2. 

•  In  addition  to  the  Bayesian  classifier  we  designed  and  implemented  the  non-hierarchical 
FL-RBC  for  type-1  and  type-2  fuzzy  sets.  The  classifier  designs  and  a  brief  review  of 
fuzzy  logic  systems  is  presented  in  Chapters  3-5. 

•  The  CFI  as  a  technique  for  decision  fusion  is  reviewed  in  Chapter  6  along  with  fusion 
architectures  for  implementing  decision  fusion. 

•  Experiments  to  evaluate  performance  of  the  classifiers  based  on  seismic  data  and  fused 
classifiers  were  performed.  Descriptions  of  these  experiments  and  the  results  obtained 
are  presented  in  Chapter  7. 

•  Conclusions  are  drawn  in  Chapter  8. 
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Chapter  2 

Data  Pre-Processing 


Pre-processing  was  done  on  the  Acoustic-Seismic  Classification  /  Identification  Data  Set 
(ACIDS)  that  consists  of  more  than  270  records  of  acoustic  and  seismic  data  collected  for 
nine  vehicles  in  four  different  environmental  conditions.  We  focused  only  on  data  collected 
for  the  normal  terrain.  The  vehicle  categories  are:  Heavy-Track,  Light-Track,  Heavy- Wheel 
and  Light- Wheel. 

2.1  Prototype  Generation 

The  seismic  data  was  collected  by  a  single  vertical-axis  geophone  and  contains  raw  digitized 
seismic  signatures  of  the  different  vehicles.  The  data  was  low-pass  filtered  at  200Hz  by  a 
fourth-order  Chebyshev  filter  and  then  sampled  at  512Hz.  The  data  was  corrupted  by  noise 
and  acoustic  signals. 

A  complete  run  lasts  from  under  60  seconds  to  over  300  seconds.  The  data  are  non¬ 
stationary  and  the  signal  to  noise  ratio  (SNR)  is  constantly  changing.  These  factors  make  it 
impossible  to  process  an  entire  run  as  a  single  entity.  We  segmented  the  data  into  two-second 
blocks  -  prototypes.  Since  the  SNR  is  highest  at  the  closest  point  of  approach  (CPA)  of  the 
vehicle,  we  used  this  point  as  a  frame  of  reference  to  generate  prototypes.  A  1024-point 
rectangular  window  with  50%  overlap  was  slid  across  the  data  before  and  after  the  CPA  to 
generate  90  prototypes  for  each  run. 

2.2  Feature  Extraction 

We  used  the  amplitudes  of  the  second  through  12th  harmonics  of  the  fundamental 
frequency  f„  as  the  features  for  classification.  So,  one  of  the  most  important  steps  in  the 
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feature  extraction  process  is  the  identification  of  the  fundamental  frequency. 

In  our  previous  ARL  study,  we  used  the  Harmonic  Line  Association  (HLA)  algorithm  to 
determine  the  fundamental  frequencies  and  to  extract  acoustic  features.  Because  of  acoustic 
coupling  into  the  measured  seismic  data,  harmonics  at  higher  frequencies  are  corrupted, 
making  the  determination  of  the  seismic  f0  difficult  using  the  HLA  algorithm.  Instead,  we 
considered  the  use  of  a  frequency-domain  estimation  algorithm  (Cepstral  Analysis)  and  a 
time-domain  algorithm  (Autocorrelation  method)  to  estimate  the  fundamental  frequency. 
For  reasons  given  in  Section  2.2.2,  wre  abandoned  the  use  of  the  autocorrelation  method. 

2.2.1  Frequency  domain  fa  estimation:  Cepstrum 

A  reliable  way  of  obtaining  an  estimate  of  the  dominant  fundamental  frequency  is  to  use 
the  cepstrum  [1],  which  is  a  Fourier  analysis  of  the  logarithmic  magnitude  spectrum  of  the 
signal.  If  the  log  amplitude  spectrum  contains  many  regularly  spaced  harmonics,  then  the 
Fourier  analysis  of  this  spectrum  will  show  a  peak  corresponding  to  the  spacing  between 
the  harmonics,  i.e.  the  fundamental  frequency.  The  cepstral  method  was  developed  in  an 
attempt  to  make  a  nonlinear  harmonic  system  more  linear.  Effectively  we  are  treating  the 
signal  spectrum  as  another  signal,  and  then  looking  for  periodicity  in  the  spectrum  itself. 
This  method  works  well  for  low  fundamental  frequencies. 

Note  that  the  name  “cepstrum”  comes  from  reversing  the  first  four  letters  in  “spec¬ 
trum”.  The  horizontal-axis  of  the  cepstrum  has  units  of  quefrency,  and  amplitude  peaks  in 
the  cepstrum  (which  relate  to  periodicities  in  the  spectrum)  are  called  rahmonics. 

To  obtain  an  estimate  of  the  fundamental  frequency  from  the  cepstrum,  we  looked  for  a 
peak  in  the  quefrency  region  corresponding  to  typical  seismic  fundamental  frequencies.  We 
considered  the  interval  [1,20]  Hz  to  correspond  to  typical  seismic  frequencies. 

2.2.2  Time  domain  fa  estimation:  Autocorrelation 

The  autocorrelation  function  finds  the  “similarity”  between  the  signal  and  a  shifted  version 
of  itself.  To  detect  /,„  we  took  a  windowr  of  the  signal,  with  a  length  at  least  twice  as  long  as 
the  longest  (potentially)  detectable  period.  In  our  case,  this  corresponded  to  a  length  of  1024 
samples,  given  a  sampling  rate  of  512  Hz.  The  autocorrelation  function  was  then  calculated 
for  this  section  of  the  signal,  and  the  fundamental  frequency  was  estimated  by  looking  for  a 
peak  in  the  delay  interval  corresponding  to  the  typical  range  of  seismic  frequencies  i.e.  [1, 
20]  Hz  [2]. 

This  method  produces  peaks  at  sub-harmonics  as  well  as  at  the  fundamental  frequency, 
and  it  is  difficult  to  determine  which  peak  corresponds  to  the  fundamental  frequency.  Fur- 
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thermore,  the  method  does  not  work  well  when  the  signal  is  varying  too  rapidly.  It  is  also 
computationally  more  intensive  than  the  cepstral  method.  Consequently,  f0  estimation  us¬ 
ing  autocorrelation  yielded  frequencies  that  were  too  high.  In  the  sequel,  we  use  only  the 
cepstral  method  for  estimating  the  fundamental  frequency. 

2.3  Distribution  of  Features 

Features  were  extracted  for  each  prototype  in  a  run  corresponding  to  one  of  the  nine  vehicle 
types.  The  number  of  runs  available  for  each  of  the  nine  vehicles  is: 

Heavy-Tracked  a:  14  runs 
Heavy-Tracked  b:  8  runs 
Heavy-Tracked  c:  15  runs 
Heavy-Tracked  d:  8  runs 
Light-Tracked  a:  15  runs 
Heavy- Wheeled  a:  8  runs 
Heavy- Wheeled  b:  8  runs 
Light- Wheeled  a:  8  runs 
Light- Wheeled  b:  4  runs 

For  each  run,  we  computed  the  mean  (run-mean)  and  standard  deviation  (run-standard 
deviation)  of  the  magnitudes  of  each  feature.  We  represented  the  ith  feature  distribution 
(i  =  1, 2, ...,  11)  by  the  interval  [run-mean  -  2  (run-standard  deviation),  run-mean+2  (run- 
standard  deviation)].  Thus,  each  run  is  represented  by  11  intervals,  one  for  each  feature. 
These  intervals  are  the  ranges  into  which  the  feature  vectors  fall  into  with  high  probability. 
Figure  2.1  contains  the  distribution  of  the  features  for  the  tracked  vehicles  and  Figure  2.2 
contains  the  distribution  of  the  features  for  the  wheeled  vehicles.  Each  bar  in  the  figure 
denotes  a  complete  run.  There  are  88  bars  in  each  plot. 

Observe  from  Fig.  2.1  and  Fig.  2.2,  that: 

1.  The  wheeled  vehicle  distributions  show  an  approximate  exponential  decay  in  the  mag¬ 
nitude  of  the  features  as  we  go  from  feature  1  to  11. 

2.  The  magnitudes  of  the  features  for  the  wheeled  vehicles  are  considerably  lower  than 
those  of  the  tracked  vehicles. 

3.  The  magnitudes  of  the  features  for  almost  all  the  vehicles  (except  HT-c)  decrease  as  the 
feature  dimension  increases;  hence,  higher-harmonic  seismic  features  may  be  unreliable. 
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(e) 


Figure  2.1:  Distribution  of  features  for  tracked  vehicles,  (a)  -  (d)  show  the  features  for  the 
four  Heavy  Track  Vehicles  and  (e)  shows  the  features  for  the  one  Light  Track  vehicle. 
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Rang*  ot  r**lur*»  for  Henvy  WW  Vohtelt  Typ*  -  A 


Rang*  of  (MturM  lor  Hoary  W>**l  V*Nd*  Typ*  -  B 


Figure  2.2:  Distribution  of  features  for  tracked  vehicles,  (a)  and  (b)  show  the  features  for 
two  Heavy  Wheel  vehicles,  (c)  and  (d)  show  the  features  for  the  Light  Wheel  vehicles. 
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(a)  (b) 

Figure  2.3:  Average  signal  energy  on  a  log  scale  for  LT,  HT,  LW  and  HW  vehicles:  (a) 
seismic  data,  and  (b)  acoustic  data.  Each  plot  only  has  four  data  points,  one  for  each  class 
of  vehicles. 

2.4  Energy  Distribution 

The  average  energy  for  a  run  is  the  integral  of  the  squared  amplitude  of  the  sensor  signal 
(acoustic  or  seismic)  under  consideration.  We  computed  the  average  energy  for  each  run  and 
then  averaged  the  results  over  all  runs  pertaining  to  a  specific  vehicle  type.  Our  results  are 
summarized  in  Fig.  2.3. 

From  Fig.  2.3,  we  observe  that: 

1.  The  seismic  data  shows  appreciably  lower  energy  levels  for  wheeled  vehicles,  which 
could  be  due  to  the  absence  of  “track  slap”  in  wheeled  vehicles.  Average  signal  energy 
could  be  used  as  a  feature  for  classification,  as  a  quick  way  to  distinguish  between 
wheeled  and  tracked  vehicles.  We  attempt  to  do  the  same  in  the  next  section. 

2.  The  same  observation  cannot  be  made  for  the  acoustic  data  set  because  of  the  high 
acoustic  energy  levels  of  the  HW  vehicles. 

2.5  Binary  Classification  of  Tracked  vs  Wheeled  Vehicles 

Since  we  are  just  using  the  energy  feature,  prototypes  were  chosen  according  to  their  energy 
levels.  90  prototypes  that  had  the  highest  energies  for  each  run  were  chosen.  The  energy 
in  each  prototype  was  used  as  the  single  feature  for  classification  of  tracked  versus  wheeled 
vehicles,  i.e.  the  energy  in  each  two-second  data  block  in  a  run  was  used  as  a  feature.  As 
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Table  2.1:  Estimated  errors  in  classification  using  different  evaluation  methods. 


Evaluation  Method 

Holdout 

Cross-Validation 

Leave- One- Out 

Re-Substitution 

Classifier  Error 

0 

0 

0 

0 

can  be  seen  in  Fig.  2.3,  the  energy  levels  for  tracked  vehicles  differ  considerably  from  those 
for  wheeled  vehicles. 

Since  we  just  have  a  one-dimensional  feature  set,  the  classification  problem  reduces  to 
finding  a  threshold  for  the  data  set.  There  are  a  number  of  simple  classification  distribution- 
free  algorithms  that  can  be  used  for  this  purpose,  e.g.  Perceptron,  Widrow-Hoff.  We  used  the 
Ho-Kashyap  algorithm  that  finds  the  threshold  by  maximizing  its  distance  to  each  prototype. 
Details  of  the  algorithm  can  be  found  in  [3]. 

2.5.1  Non- Adaptive  Classification 

In  non-adaptive  classification,  the  energy  levels  for  the  prototypes  in  a  run  were  summed  to 
obtain  the  energy  of  the  entire  run.  The  energies  for  the  tracked-vehicle  runs  and  those  for 
-  the  wheeled- vehicle  runs  were  computed  and  used  as  samples  in  the  Ho-Kashyap  classifier. 
The  performance  of  the  classifier  was  evaluated  using  the  Holdout,  cross-validation,  leave- 
one-out,  and  re-substitution  methods. 

In  the  Holdout  method,  10%  of  the  samples  were  drawn  randomly  and  used  for  testing, 
while  the  rest  of  the  data  was  used  for  training.  This  was  repeated  a  large  number  of  times 
(over  1000)  and  the  average  error  on  all  trials  provided  an  estimate  of  the  classifier  error.  In 
10-fold  cross-validation  the  data  set  was  divided  into  10  mutually  exclusive  subsets.  Testing 
was  done  using  one  of  the  10  subsets,  whereas  training  was  done  using  the  remaining  nine 
subsets.  The  estimated  classifier  error  is  the  error  averaged  over  all  10  possible  designs. 
The  leave-one-out  method  consisted  of  as  many  designs  as  the  number  of  runs.  In  the  l- th 
design,  the  /-th  run  was  set  aside  for  testing  and  the  other  runs  were  used  for  training.  As 
before,  the  averaged  error  over  all  the  designs  was  the  estimated  classifier  error.  In  the  re¬ 
substitution  method  the  entire  data  set  was  used  for  both  training  and  testing.  Results  for 
these  four  methods  are  summarized  in  Table  2.1.  As  seen  from  this  table,  perfect  classification 
was  obtained.  This  implies  that  the  feature  set  is  separable  for  this  binary  classification 
problem.  Using  only  energy  as  a  feature,  we  were  able  to  classify  tracked  and  wheeled 
vehicles  accurately  when  we  used  entire  runs. 
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(a)  (b) 


Figure  2.4:  Performance  of  the  adaptive  classifier,  i.e.  classifier  error,  as  a  function  of  time, 
(a)  re-substitution  method  and  (b)  leave-one-out  method 

2.5.2  Adaptive  Classification 

In  many  applications,  we  need  to  classify  the  prototypes  in  real  time  and  hence  may  not 
have  the  luxury  of  processing  an  entire  run.  In  such  cases,  we  use  adaptive  classification. 
In  this  approach,  we  computed  the  energy  levels  of  the  prototypes  and  summed  them  up 
as  and  when  they  became  available,  after  which  classification  was  done  on  the  available 
data.  The  performance  of  the  adaptive  classifier  was  evaluated  using  the  re-substitution  and 
holdout  methods,  the  details  of  which  have  been  described  in  the  previous  section.  The 
Ho-Kashyap  algorithm  was  used  for  training  on  the  set  of  training  prototypes  as  specified 
by  the  evaluation  method,  and  once  the  threshold  was  obtained  the  test  prototypes  were 
classified  adaptively.  The  average  error  for  the  two  evaluation  methods  is  plotted  in  Fig. 
2.4  as  a  function  of  time  (which  corresponds  to  number  of  prototypes  processed).  Observe 
that  perfect  classification  of  tracked  and  wheeled  vehicles  is  achieved  when  about  half  of  the 
prototypes  in  a  run  are  processed.  This  corresponds  to  just  over  50  seconds  in  real  time. 
This  makes  intuitive  sense  because  the  Closest  Point  of  Approach  (CPA)  of  the  vehicle  is 
reached  about  midway  through  a  run  and  the  differences  in  the  energy  levels  between  tracked 
and  wheeled  vehicles  are  most  prominent  at  this  point. 
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Chapter  3 

Bayesian  Classifier 


The  Bayesian  classifier  was  established  based  on  the  assumptions  that  we  made  about  the 
conditional  probability  density  function  (pdf)  of  each  category.  It  has  nine  conditional 
probability  models,  each  of  which  corresponds  to  one  kind  of  ground  vehicle,  and  is  described 
by  a  Gaussian  pdf: 

p{x\Vj)  ~  y  =  1, . . .  ,9,  (3.1) 

where  x  €  Rn  represents  the  feature  vector  (the  magnitudes  of  the  second  through  12th 
harmonic  components),  Vj  represents  the  j- th  kind  of  ground  vehicle,  and  nx,  and  E,-  are 
the  mean  and  covariance  matrix  of  the  multi-variate  Gaussian  distribution  associated  with 

Vj. 

The  mean  and  covariance  matrix  associated  with  Vj  were  estimated  from  the  prototypes 
of  Vj,  i.e., 

1  v-> 

mi  =  ?LX  (3-2) 

xsy- 

=  N  -l  E  (x  _  “  “?)*  >  (3-3) 

J  xey 

where  Nj  is  the  number  of  prototypes  of  Vj. 

Given  an  unlabeled  feature  vector  x'  as  the  input,  the  Bayesian  classifier  first  computes 
the  log  likelihood  L(x'\Vj)  for  each  kind  of  vehicle  as: 

L  (xW  =  logp  (x'\Vj)  -  -  log  |Sj|  -  (x'  -  m,)tSj1(x'  -  m,).  j  =  1, . . . ,  9  (3.4) 

and  then  compares  all  likelihoods  to  determine  Vmax  that  is  associated  with  the  maximum 
likelihood.  Finally,  the  Bayesian  classifier  assigns  x'  to  the  same  category  as  Vmax. 
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Chapter  4 

Brief  Review  of  Fuzzy  Logic  Systems 


We  designed  and  implemented  non-Hierarchical  FL-RBCs  which  are  based  on  the  theory  of 
fuzzy  logic  systems  (FLS).  It  is  thus  pertinent  to  include  a  brief  review  of  the  theory  of  FLS 
before  detailing  the  classifier  designs. 

A  FLS  consists  of  four  components  -  fuzzifier,  rule-base,  fuzzy  inference  engine  and  the 
output  processor.  The  rule-base  contains  rules  that  are  extracted  from  expert  knowledge, 
mathematical  models  or  data.  Each  rule  describes  a  relation  from  the  domain  X{  x  •  •  •  x  Xp  C 
RP  to  the  range  Y  G  R  and  can  be  expressed  in  the  following  IF-THEN  form: 

Rj  :  IF  Xi  is  Ff  and  x2  is  Fjf  and  . . .  and  xp  is  F£,  THEN  y  is  Gj,  j  =  1, ...,  M, 
where  M  is  the  total  number  of  rules,  R?  represents  the  j-th  rule,  F'l  is  the  antecedent 
associated  with  the  k- th  input  variable  Xk{k  =  1,  ...,p)  and  CP  is  the  consequent  associated 
with  the  output  variable  y. 

Given  a  vector  of  measurements  X'  =  [a^, ...,  x'p]  ,  the  fuzzifier  converts  them  into  fuzzy 
sets  A^, Ag/  ,  one  for  each  dimension.  The  fuzzy  inference  engine  then  computes  the 
firing  degree  for  each  rule.  The  firing  degree  describes  the  amount  by  which  the  input  fuzzy 
sets  A*.', ...,  Ax>  match  the  antecedents,  F{, ...,  F£.  The  output  processor  computes  a  crisp 
output,  y{X'),  by  using  the  firing  degrees  and  the  consequents  of  all  the  rules. 

In  summary,  through  fuzzification,  inference  and  output  processing,  the  FLS  maps  the 
input  X'  to  an  output  y(X')  according  to  the  rules;  however,  because  the  input  fuzzy  sets, 
antecedents  and  consequents  can  either  be  type-1  or  type-2  fuzzy  sets,  the  specific  compu¬ 
tations  of  fuzzification,  inference  and  output  processing  are  different  for  type-1  and  type-2 
FLS.  We  describe  these  computations  in  more  detail  in  Sections  4.1  and  4.2. 
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4.1  Type-1  FLS 


In  a  type-1  FLS,  the  input  fuzzy  sets,  antecedents  and  consequents  are  all  type-1  fuzzy 
sets,  and  the  output  processing  only  consists  of  defuzzification.  For  purposes  of  maintaining 
consistency  with  subsequent  descriptions  of  classifiers,  we  describe  the  operations  and  the 
optimization  procedures  of  the  type-1  FLS. 


Rule  Base:  In  the  type-1  implementation,  the  antecedents  of  each  rule,  F3k  (k  =  1,  ...,p)  are 
modeled  as  type-1  fuzzy  sets  whose  membership  functions  (MF)  are  Gaussian  centered  at 
mi  with  standard  deviation  ak,  i.e., 


Fi  ■  =  exp  { 


i 

~2 
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(4.1) 


Fuzzification:  Given  the  input  feature  vector  X'  =  [a:J,  ...,x']T  ,  the  system  encodes 
x'k{k  =  1,  ...,p)  as  a  type-1  fuzzy  set  Ak  whose  MF  is  Gaussian  centered  at  x'h  with  standard 
deviation  a].,  i.e., 


Ak  :  Pk(xk)  =  exp 


=  <t>(xk]Xk,<Tk) 


(4.2) 


Fuzzy  Inference:  For  the  type-1  FLS,  the  inference  engine  obtains  once  firing  degree  for 
each  rule  based  on  the  input  and  antecedent  fuzzy  sets,  by  using  the  sup-star  composition 
[7],  i.e.,  the  firing  degree  of  the  j-th  rule  fj(X')  (j  =  1, ...,  M)  is  computed  [7]  as 


V 

fj(X’)  =  n*up  (xk)pFi(xk)^ 


k=  i  Xk 
,  2 


n  exp  {~2  ai+™(ik)2i  ~  n  ^  (x'k] 


k=l 


(4.3) 

(4.4) 


Output  processing:  the  output  processing  of  the  type-1  implementation  only  consists  of 
defuzzification  which  leads  to  a  crisp  output,  y(X'),  computed  as  [7] 


y(X')  = 


S".  /W 


(4.5) 


where  fl(x')  and  g'  are  the  firing  degree  and  the  consequent  respectively,  of  the  j-th  rule. 
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4.1.1  Optimization 

The  parameters  of  a  type-1  FLS  include  the  input  model  parameters  a k,  antecedent  param¬ 
eters  ml  and  ( k  —  1,  and  consequent  parameters  (j  =  1, M).  there  are  a  total 
of  p  +  M(2p  +  1)  design  parameters. 

We  use  the  steepest  descent  algorithm  to  tune  these  parameters  so  as  to  minimize  a  given 
objective  function  J,  i.e. 


9  now  —  9  0f,l 


Bold 


(4.6) 


where  0  is  the  generic  symbol  for  all  parameters  to  be  optimized. 

The  partial  derivative  8J/89  is  computed  through  the  partial  derivative  8y/89  as 


8_J__8J_8y 
89  8y  89 

where  y  is  the  crisp  output  of  type-1  FLS.  The  calculation  of  partial  derivative  8J/8y  depends 
on  the  specific  form  of  J.  The  formulae  for  8y/89  for  the  input,  antecedent  and  consequent 
parameters  of  the  type-1  FLS  are  given  in  [11]. 


4.2  Type-2  FLS 

In  the  type-2  implementation  of  the  FLS,  the  input  fuzzy  sets,  antecedents  and  consequents 
are  all  type-2  fuzzy  sets,  and  the  output  processing  consists  of  type-reduction  and  defuzzi¬ 
fication.  We  used  interval  type-2  fuzzy  sets  in  our  implementation  and  so  we  illustrate  the 
operations  and  optimization  procedures  for  them. 

Rule  Base:  The  antecedents  of  each  rule  F3k  ( k  —  1,  ...,p)  are  modeled  as  interval  type- 
2  fuzzy  sets  whose  MFs  are  Gaussian  with  uncertain  means,  m  G  [m^fc,  and  uncertain 
standard  deviations,  a  G  [<r[x,  crtk].  The  lower  and  upper  MFs  (LMF  and  UMF)  of  F^, 
y^(xk)  and  Jii(xk),  are  given  as  [11] 


y!k{xk)  = 

Wxk)  = 


<t>  {xk;  ctL) 
0  ixF  m\  k,  <4,k) 
<i>{xk;mJlky2k) 
<t>  {xk\  mJ2  k,  4X) 
1 


if  xk  <  {m{!k  +  m{k)/2 

if  xk  >  (mj  fc  +  Kfc)/2 

if  xk  <  m\  k 
if  Xk  >  m?2X  ■ 
otherwise 


(4.7) 

(4.8) 
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Table  4.1:  Formulas  for  the  firing  degrees. 


^.location 


/V*) 


Fuzzification:  Given  the  input  feature  vector  X'  =  [a^, ...,  x'p]T,  the  system  encodes  xk  (k  — 
1  as  an  interval  type-2  fuzzy  set  Ak  whose  MF  is  Gaussian  centered  at  x'k  with  with 
uncertain  standard  deviation  a  G  [<x i,fc, cr2)fc]-  The  LMF  and  UMF  of  Ak,  £t.  (a:*,)  and  JIk(x *,), 
are  given  as 

BxJPk)  =  exp 
T^JXk)  =  exP 


|  =  <t>  (x*;  4.  ^1  ,fc)  (4.9) 

(gfcgajLa,fc)  |  -  ^  (**5  x*’  a2>fc)  (4-10) 


Fuzzy  Inference:  In  the  interval  type-2  implementation,  the  firing  degree  of  each  rule 
consists  of  two  firing  degrees  -  the  lower  and  upper  firing  degrees.  These  are  computed  based 
on  the  input  and  antecedent  fuzzy  sets,  by  using  the  extended  sup-star  composition  [7],  i.e. 
the  lower  and  upper  firing  degrees  of  the  j-th  rule,  fj(x’)  and  f3  (x')  are  computed  as 


(4.11) 


£’(*') = risupt? ,  ,  m  =  to 

fc=i Xk  x>i  fc=i 

7  (%')  =  n-p^  (x^p ,  (**)  -  n  7^') 


(4.12) 


fc= l 


fc=i 


The  formulae  for  computing  f^{x')  and  fi(x')  are  given  in  Table  4.1. 


Output  processing:  For  an  interval  type-2  FLS,  output  processing  consists  of  type  reduc¬ 
tion  followed  by  defuzzification.  Type-reduction  obtains  an  interval  output  [yi(X'),yr(X')\, 
based  on  the  lower  and  upper  firing  degrees  as  well  as  the  consequents  by  using  the  Karnik- 
Mendel  iterative  procedures  [7].  The  end  points  of  the  type  reduced  set  yi(X')  and  yr(X') 
can  be  expressed  as  [11] 


V,{X }  ~  E",  {4?7V)  +  (1  -  «£'(*')} 

Vr(X  ’  ~  E"i  («7V)  +  (i  - «)/’(*')} 

where  Sj  and  8{  ( j  —  1, ...,  M)  are  two  indicator  functions  defined  for  each  rule  as 

l  if  gj<yi(X') 


(4.13) 


(4.14) 


<5/  = 


si  = 


0  otherwise 


i  ^  gj  >  yi(X') 


(4.15) 


(4.16) 


0  otherwise 

\ 

Defuzzification  of  the  type-2  FLS  obtains  a  crisp  output,  y(X’),  from  the  type-reduced  set, 


i.e., 


y(x‘)  =  \[vi(X’)  +  yM')} 


(4.17) 


4.2.1  Optimization 


The  parameters  of  a  type-2  FLS  include  the  input  model  parameters  <72,*},  antecedent 
parameters  and  (k  =  l,...,p),  and  consequent  parameters  gj(j  = 
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1, M).  there  are  a  total  of  2p+M(4p+l)  design  parameters.  We  used  the  steepest  descent 
algorithm  to  tune  these  parameters.  Similar  to  the  type-1  FLS,  the  partial  derivatives  5J/69 
of  the  type-2  FLS  are  computed  through  the  partial  derivatives  5y/56.  The  formulae  for 
5y/66  for  the  input,  antecedent  and  consequent  parameters  of  the  interval  type-2  FLS  are 
given  in  [11]. 
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Chapter  5 


Non-Hierarchical  FL-RBC 


The  non-hierarchical  FL-RBC  as  developed  in  [11]  can  be  considered  as  a  traditional  FL-RBS 
except  that  its  consequents  and  output  are  two-dimensional. 

The  rule  base  of  the  non-hierarchical  FL-RBC  has  nine  rules,  one  for  each  kind  of  vehicle, 
that  are  expressed  as: 

Rj:  IF  Xi  is  F{  and  . . .  and  a’n  is  F{y,  THEN  y  is  j  =  1, ...  ,9 

where  Rj  represents  the  rule  for  the  j-th  kind  of  vehicle,  F£  (k  —  1, . . . ,  11)  is  the  antecedent 
modifying  the  k-th  feature  variable  Xk,  and  [<?i, gi]*  is  the  consequent  modifying  the  output 
variable  y  and  is  modeled  as  a  two-dimensional  vector  of  crisp  numbers.  Given  an  unlabeled 
feature  vector  x'  =  [x'x , . . .  ,x'nf  (which  contains  crisp  measurements)  as  the  input,  through 
fuzzification,  inference  and  output  processing,  the  non-hierarchical  FL-RBC  obtains  a  crisp 
output  vector  [yi(x'),  y2(x')F-  The  non-hierarchical  FL-RBC  makes  a  final  decision  for  x' 
based  only  on  the  signs  of  [yi(x.') ,  y2(x.')]t  according  to  Table  5.1. 

The  non-hierarchical  FL-RBC  architecture  can  be  implemented  based  on  either  type-1 
or  type-2  fuzzy  set  and  fuzzy  logic  theories.  In  the  type-1  non-hierarchical  FL-RBC,  each 
antecedent  Fl  (k  =  1, . . . ,  11  and  j  =  1, . . . ,  9)  is  modeled  as  a  type-1  fuzzy  set  whose  MF  is 
Gaussian  centered  at  m?k  with  standard  deviation  <jj(,  and  the  fuzzification  process  converts 

Table  5.1:  The  classification  decision  for  x'  based  on  [^(x'),  ^(xOf- 


Decision 

yi(x') 

2/2  (x') 

heavy-tracked 

positive 

positive 

light-tracked 

positive 

negative 

heavy-wheeled 

negative 

positive 

light-wheeled 

negative 

negative 
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each  input  feature  measurement  x'k  ( k  =  1, ...,11)  into  a  type-1  fuzzy  set  whose  MF  is 
Gaussian  centered  at  x'k  with  standard  deviation  ak.  In  the  type-2  non-hierarchical  FL- 
RBC,  each  antecedent  is  modeled  as  an  interval  type-2  fuzzy  set  whose  MF  is  Gaussian  with 
uncertain  mean,  m  6  and  uncertain  standard  deviation,  a  €  [a*  fc] ,  and 

the  fuzzification  process  converts  x'k  into  an  interval  type-2  fuzzy  set  whose  MF  is  Gaussian 
centered  at  x'k  with  uncertain  standard  deviation,  a  €  Accordingly,  the  concrete 

computations  of  fuzzification,  fuzzy  inference  and  output  processing  in  the  type-1  and  type-2 
non-hierarchical  FLRBCs  are  different,  the  details  of  which  can  be  found  in  Sections  4.1  and 
4.2. 

There  are  a  total  of  227  parameters  in  the  type-1  non-hierarchical  FL-RBC,  i.e.,  the 
consequent  parameters  gf  and  <73,  antecedent  parameters  mj.  and  crk,  and  input  parameters 
<7fc  (k  =  1, . . . ,  11  and  j  =  1, . . . ,  9).  Since  each  rule  corresponds  to  one  kind  of  ground  vehicle, 
the  consequent  and  antecedent  parameters  of  this  rule  are  initialized  based  on  the  label  and 
prototypes  of  its  corresponding  vehicle.  More  specifically,  the  consequent  [gi,gi]f  of  one 
rule  is  initialized  as  [+l,+lf  ([+1,-1]*,  [—1,4-1]*  or  [-1,-1]*)  if  its  corresponding  vehicle 
belongs  to  the  heavy-tracked  (light-tracked,  heavy-wheeled  or  light-wheeled)  category.  The 
antecedent  parameters  of  each  rule  are  initialized  based  on  the  prototypes  of  its  corresponding 
vehicle,  i.e., 


"4(0)  =  W  Xk  (5-1) 


where  Nj  is  the  number  of  prototypes  belonging  to  Vj.  The  input  parameters,  07.,  are 
initialized  as: 

1  M 

«l(°)=SE»l(»)  (5.3) 

j=l 

There  are  a  total  of  436  parameters  in  the  type-2  non-hierarchical  FL-RBC,  i.e.  the 
consequent  parameters  gj  and  g£,  antecedent  parameters  { mJl  k,  m!2  k ,  ,  0^. } ,  and  input 
parameters  These  parameters  of  the  type-2  non-hierarchical  FL-RBC  are  initial¬ 

ized  based  on  the  optimal  parameters  of  the  competing  fully  designed  type-1  non-hierarchical 
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FL-RBC  according  to  the  following: 


5i(0)  =  g{  (type-1  optimal) 

£2(0)  =  £2  (type- 1  optimal) 

(5.4) 

m{k(0)  =  m{.(type-l  optimal)  —  7^ (type- 1  optimal) 
m32  k(0)  =  vij.  (type-1  optimal)  +  7<r(.(type-l  optimal) 

(5.5) 

4,ji.(0)  s  (1  -  7)4  (type-1  optimal) 

4,*(°)  =  (!  +  7)4  (type- 1  optimal) 

(5.6) 

<Ji ,k  =  (1  -  7)07=  (type- 1  optimal) 
o”2,fc  =  (1  +  7)0-*  (type-1  optimal) 

(5.7) 

where  7  e  (0, 1).  Note  that  in  these  equations  their  left-hand  sides  are  the  initial  values  of 
the  parameters  of  the  type-2  non-hierarchical  FL-RBC,  and  their  right-hand  sides  are  the 
optimal  values  of  the  parameters  of  the  competing  type-1  non-hierarchical  FL-RBC. 

During  the  training  period,  all  the  parameters  of  the  type-1  or  type-2  non-hierarchical 
FL-RBC  are  optimized  by  using  a  steepest  descent  algorithm  to  minimize  the  following 
classification  error  objective  function: 

J=  \{{d\-Vi (x)]2  +  [d2-2/2(x)]2}  (5.8) 

x€^train 

where  Vtrain  is  the  training  set,  [d\ ,  ri2] 1  is  the  desired  classification  result  for  x,  and  is  [+1,  +1]* 
([+1,-1]*,  [— 1,+lf  or  [— 1,  —if)  if  x  belongs  to  the  heavy-tracked  (light-tracked,  heavy¬ 
wheeled  or  light-wheeled)  category. 
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Chapter  6 

Classifier  Fusion 


Wu  and  Mendel  [11]  showed  that  non-hierarchical  FL-RBCs  performed  well  using  acoustic 
data  to  classify  ground  vehicles.  We  used  seismic  data  to  do  the  same.  However,  since  seismic 
data  is  inherently  more  unreliable,  we  attempted  to  improve  performance  by  developing  a 
number  of  classifier  designs  and  fusing  the  outputs  of  the  classifiers.  We  used  the  Choquet 
Fuzzy  Integral  (CFI)  as  a  tool  for  decision  fusion.  The  CFI  has  been  shown  [1,  10]  to  achieve 
better  classification  performance  than  the  worst  component  classifier.  In  this  section  we 
briefly  review  the  theory  behind  the  fusion  algorithm  and  illustrate  the  fusion  architectures 
that  we  employed. 

6.1  Decision  Fusion  using  the  CFI 

The  fuzzy  integral  is  a  non-linear  approach  to  combine  multiple  sources  of  uncertain  infor¬ 
mation  (e.g.,  in  pattern  recognition  applications,  where  results  from  multiple  classifiers  will 
be  combined).  The  CFI  is  a  specific  type  of  fuzzy  integral  which  combines  information  from 
multiple  sources  by  taking  into  account  subjective  evaluation  of  the  worth  of  each  of  the 
sources.  It  relies  on  the  concept  of  a  fuzzy  measure  which  we  describe  next. 

6.1.1  Fuzzy  Measures 

The  fuzzy  integral  relies  on  the  concept  of  a  fuzzy  measure  which  in  turn  is  a  generalization 
of  the  concept  of  a  probability  measure.  Consider  a  finite  universal  set  Y  —  {ij\ , . . . ,  yn}  and 
let  P(Y)  be  the  power  set  of  Y  .  For  our  application  Y  is  the  set  of  classifiers  to  be  fused. 
A  fuzzy  measure  over  the  set  X  is  the  set  function 

<?:P(F)-+[0,1]  (6.1) 
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such  that: 


1.  g(0)  =  O,g(Y)  =  l 

2.  If  A,  B  C  P(Y)  and  A  C  B,  then  g(A)  <  g(B). 

Usually  g{A)  is  interpreted  as  the  contribution  of  the  subset  A  within  the  set  Y.  Sugeno  [8] 
introduced  the  concept  of  A  -  fuzzy  measures  which  specifies  the  importance  of  the  union  of 
disjoint  subsets  A  and  B.  the  fuzzy  measure  of  the  union  is  given  by 

9\(A  U  B)=  gx(A)  +  gx(B)  +  Xgx(A)gx(B)  ,  A  >  -1  (6.2) 

for  all  A,BcY  with  A  D  B  —  0. 

Given  the  fuzzy  densities  g\,gx,...,gx  for  a  set  of  n  classifiers  we  can  find  A  by  solving 
the  equation 

n 

A  +  1  =  n  (1  +  A j)  (6.3) 

t=l 

Refer  to  Appendix  A  to  see  how  (A.8)  is  obtained.  Appendix  A  (which  is  a  tutorial)  also 
contains  a  detailed  list  of  the  properties  of  the  fuzzy  measures. 

6.1.2  The  Choquet  Fuzzy  Integral  (CFI) 

The  Choquet  fuzzy  integral  (CFI)  is  the  integral  of  a  measurable  function  h:Y  —*  [  0,1]  with 
respect  to  a  fuzzy  measure  gx.  For  our  decision  fusion  application,  Y  is  a  set  of  classifiers  and 
h(y)  is  the  soft  output  of  the  classifier  (i.e.,  the  confidence  or  evidence  grade  of  the  classifier) 
denoting  that  an  input  sample  is  from  a  particular  class.  In  general,  Y  =  {yi,...,yn}  is 
a  set  of  information  sources  and  h(yi)  is  the  confidence  grade  of  source  i  that  a  particular 
hypothesis  is  true. 

Under  this  framework,  the  CFI  is  defined  as 

Eg(h)=  f  h  O  g  =  h(Ui)  [g  (A)  -  9  ( A-i )]  (6.4) 

i= 1 

where  h(y\)  >  h(y<>)  >  •  •  •  >  h(yn)  and  h(yn+1)  =  0.  Set  A  is  a  sequence  of  nested  subsets 
starting  from  A\  =  {yi},  and  subsequently  adding  in  elements  y2,—,yn,  one  at  a  time  to 
get  Ai  =  {y\,  ... ,  jji}  C  Y.  For  more  details  on  the  CFI,  properties,  heuristic  interpretations 
and  other  examples,  see  Appendix  A. 

In  this  form  of  the  CFI  h(y-i)  is  a  crisp  real  value  and  the  ordered  set  h  is  considered  to  be 
a  type-1  fuzzy  set.  It  is  monotonically  decreasing  in  the  domain  [0, 1].  When  the  classifiers 
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are  constructed  based  on  type-2  logic,  then  the  output  of  the  classifier  h(yi)  is  an  interval 
type-2  fuzzy  set.  We  denote  the  end-points  of  this  interval  set  as  hn  and  hir.  The  extended 
version  of  the  CFI  is  obtained  when  the  interval  type-2  fuzzy  sets  are  fused  together.  In  this 
case,  the  extended  CFI  becomes  [1] 


n  n 

52hil  id  (A)  —  9  >  52  hir  [9  C^i) 


i=l 


i= 1 


-  9  (A-i)] 


(6.5) 


6.2  Architectures  for  Decision  Fusion 

The  CFI  can  be  applied  in  different  ways  to  the  multi-category  classification  problem.  We 
focus  on  the  application  of  the  CFI  for  decision  fusion. 


6.2.1  Internal  Decision  Fusion 

During  the  previous  years  of  our  study,  classifier  architectures  were  developed  based  on 
acoustic  data  to  classify  ground  vehicles  into  the  four  categories.  These  classifiers  were 
evaluated  using  the  leave-one-out  and  cross-validation  methods.  In  each  of  these  methods 
numerous  classifier  designs  emerged,  the  results  of  which  were  combined  using  a  majority 
vote.  We  propose  to  use  the  CFI  to  fuse  the  outputs  of  these  classifiers. 

Since  the  non-hierarchical  FL-RBC  has  been  found  to  be  the  most  effective  classifier 
architecture  [11],  the  outputs  of  this  classifier  are  fused.  As  mentioned  in  Chapter  5  the  non- 
hierarchical  FL-RBC  can  be  implemented  using  type-1  or  type-2  fuzzy  sets.  In  the  former 
case,  (A.14)  is  used  while  in  the  latter  case  (6.5)  is  used  for  fusion. 

Figure  6.1  shows  a  block  diagram  of  this  fusion  architecture.  To  illustrate  more  clearly, 
the  usage  of  the  CFI  for  fusion,  we  consider  a  simple  binary  classification  example. 

Fusion  of  Binary  Classifiers 

Let  X  —  {xi, £2, . . . ,  Xu}  be  the  feature  set  extracted  from  the  data.  The  antecedents  in 
each  of  the  1(1  =  1, ...,  9)  rules,  F[  (k  =  1, ... ,  11)  (one  for  each  feature)  are  modeled  as  type- 
1  fuzzy  sets  with  membership  functions  (MFs)  yFi(xk),  and  the  consequents  ql  are  modeled 
as  crisp  numbers.  Given  an  extracted  feature  vector  X'  =  [x[, ....  x'n],  the  type-1  FL-RBC 
models  each  x'k  as  a  type-1  fuzzy  set  Xk  and  computes  the  firing  degree  fl(x')  for  each  rule. 
The  rules  are  then  combined  through  defuzzification  to  obtain  a  crisp  output  y{x').  The 
decision  about  the  feature  vector  X'  depends  on  the  sign  of  y{x').  If  y(x')  is  positive  then 
x 1  is  classified  as  a  tracked  vehicle,  and  if  y(x')  is  negative  then  x’  is  classified  as  a  wheeled 
vehicle. 
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Classifier  Fusion  for  Class-/ 


Figure  6.1:  Block  diagram  representation  of  classifier  fusion  using  the  CFI. 

In  the  interval  type-2  FL-RBC,  the  rule  structure  is  the  same  as  that  of  the  type-1 
FL-RBC,  but  the  antecedents  Fj,(k  =  1,...,11)  and  the  feature  sets  are  modeled  as  in¬ 
terval  type-2  fuzzy  sets.  The  consequent  ql  is  still  modeled  as  a  crisp  number.  Given  an 
extracted  feature  vector  X'  =  [x[,  ...,a/n],  the  FLS  models  each  x'k  as  an  interval  type- 
2  fuzzy  set  Xk  and  computed  the  upper  and  lower  firing  degrees,  /  (x')and  for 

each  rule.  The  rules  are  combined  through  type  reduction  to  obtain  a  type-reduced  set 
[yi(x'),yr(x')\.  Finally,  the  type-2  FL-RBC  defuzzifies  the  type-reduced  set  to  get  a  crisp 
output  y(x')  =  [yi(x')  +  yr(x ')]  /2.  As  in  the  case  of  the  type-1  FL-RBC,  the  decision  on  the 
extracted  feature  vector  X'  depends  on  the  sign  of  y(x') ,  where  if  y(x')  is  positive  then  x' 
is  classified  as  a  tracked  vehicle,  and  if  y(x')  is  negative  then  x'  is  classified  as  a  wheeled 
vehicle. 

Cross-validation  is  used  to  test  the  classifiers  and  a  number  of  designs  emerges  for  the 
type-1  and  the  type-2  classifiers.  The  parameters  of  all  classifiers  are  optimized  using  the 
training  data  and  a  steepest  descent  algorithm.  After  training,  the  performance  of  the  kth 
classifier  is  characterized  by  its  false  alarm  rate  (FAR),  pu ■  Given  a  new  input  feature  vector 
X'  =  K,...,^],  both  the  type-1  and  type-2  FL-RBCs  generates  outputs  y(X')  for  each  of 
their  classifier  designs. 

The  CFI  can  be  used  to  combine  all  outputs  of  the  type-1  and  type-2  classifiers  with 
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Figure  6.2:  Block  diagram  of  the  decision  strategy  of  the  fused  classifier. 

respect  to  their  corresponding  FARs.  The  numerical  outputs,  y(X'),  of  the  classifier  designs 
are  the  functions  hk(X)  (k  =  1 , ...  ,n)  for  a  given  set  of  input  feature  vectors  X,  and  the 
fuzzy  densities  gk  correspond  to  the  performance  of  each  classifier  design.  If  the  probability 
of  classification  error  using  training  data  for  design  yk  is  Q(yk),  then  the  fuzzy  density  for 
classifier  yk  is  gk  =  Q{yk )  The  outputs  of  the  individual  classifier  designs  are  combined  using 
the  CFI  as  in  (A. 14),  and  a  final  output  y(X’)  is  obtained  which  is  thresholded  to  make  a 
final  decision  as  to  whether  the  input  feature  set  X1  corresponds  to  a  wheeled  or  a  tracked 
vehicle. 

6.2.2  External  Decision  Fusion 

Since  we  have  access  to  both  acoustic  and  seismic  data,  we  can  fuse  classifiers  based  on  each. 
We  first  evaluate  the  performance  of  the  classifiers  based  on  acoustic  data  and  on  seismic 
data  and  use  their  individual  performances  as  the  A-fuzzy  measures  in  the  CFI.  The  outputs 
of  these  classifiers  are  fused  to  provide  a  fused  classifier  which  is  then  based  lined  against  the 
individual  classifiers.  Figure  6.2  shows  a  block  diagram  illustrating  external  decision  fusion. 
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Chapter  7 

Experiments  and  Results 


To  evaluate  the  Bayesian  and  FL-RBCs,  we  have  performed  experiments  that  test  the  clas¬ 
sifiers  based  on  seismic  data  as  well  as  on  fused  classifiers.  During  the  past  few  years,  Wu 
and  Mendel  [11]  established  a  testing  methodology  which  we  use  here  in  order  to  remain 
consistent  and  also  for  effective  comparison.  We  describe  these  experiments  in  detail  in  the 
following  sections. 

7.1  Leave-One-Out  Experiment [11] 

The  leave-one-out  experiment  determines  the  performance  of  the  classifiers  based  on  the 
seismic  data  alone.  It  consists  of  88  classifier  designs  (since  there  are  88  vehicle  runs  avail¬ 
able).  In  the  t-th  design,  the  t— th  run  is  left  out  for  testing,  and  the  remaining  87  run  are 
used  for  training.  The  training  data  are  used  to  optimize  the  classifier,  and  the  testing  data 
are  used  to  assess  the  performance  of  the  classifier.  We  used  CPA-based  prototypes  for  this 
experiment.  FL-RBCs  using  type-1  and  type-2  fuzzy  logic  and  the  Bayesian  classifier  were 
tested  using  this  method. 

So  far  we  have  designed  the  classifier  in  the  way  that  the  decision  for  each  prototype  only 
depends  on  its  own  features,  which  we  call  the  non-adaptive  operating  mode.  In  previous 
years,  we  have  found  that  even  a  simple  majority-voting  based  adaptive  operating  mode  can 
improve  the  classification  performance  [11].  In  this  adaptive  working  mode,  the  classification 
decision  for  each  prototype  depends  on  not  only  the  prototype  itself,  but  also  on  the  decisions 
made  on  other  prototypes  of  the  same  run.  More  specifically,  if  Xi,X2,.-,xn  are  the  proto¬ 
types  of  the  same  run  and  ,Si,S2>  —,sn  are  their  corresponding  non-adaptive  decisions,  then 
the  adaptive  decision  for  xn  ,  s"  is  obtained  by  taking  a  majority  vote  on  s1}  s2, ...,  $n-i,  sn, 
i.e.,  s"  is  heavy-tracked  (light-tracked,  heavy-wheeled  or  light- wheeled)  only  if  more  than 
half  of  S'i ,  s2, sn  are  heavy-tracked  (light-tracked,  heavy- wheeled  or  light- wheeled).  We 
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conducted  the  leave-one-out  experiment  for  both  the  non-adaptive  and  adaptive  operating 
modes  for  seismic  data.  An  algorithm  detailing  the  procedure  is  given  next 

for  t  =1:88  //88  designs  in  total 

{ 

Leave  out  the  prototypes  of  the  t-th  run  for  testing; 

Use  the  prototypes  of  the  remaining  runs  for  training; 

Initialize  the  parameters  of  the  type-1  fuzzy  rule-based  classifier; 
for  n=i:400  //train  and  test  for  400  epochs 
{  Optimize  the  type-1  FL-RBC  using  training  prototypes; 

Test  the  type-1  FL-RBC  using  the  testing  prototypes  for  the  non-adaptive 
operational  mode; 

Save  the  testing  error  rate  as  ei(t,n); 

} 

Save  the  minimum  of  e\ (t,n)  for  n=l,...,400  as  Pi(t)-, 

Save  the  optimal  parameters  as  8i(t); 

Test  the  optimal  type-1  FL-RBC  using  the  same  testing  prototypes  for  adaptive 
operational  mode,  and  save  the  classification  error  as  a,i(t); 

Initialize  the  parameters  of  the  type-2  FL-RBC  based  on  6i(t)\ 
for  j=l:400  //train  and  test  for  400  epochs 
{  Optimize  the  type-2  FL-RBC  using  training  prototypes; 

Test  the  type-2  FL-RBC  using  the  testing  prototypes  for  the  non-adaptive 
operational  mode; 

Save  the  testing  error  rate  as  e% (t,n); 

} 

Save  the  minimum  of  {t,n)  for  r?  =  l,...,400  as  p^it); 

Save  the  optimal  parameters  as  82 (t)  ; 

Test  the  optimal  type-2  FL-RBC  using  the  same  testing  prototypes  for  adaptive 
operational  mode,  and  save  the  classification  error  as 
} 

Compute  the  mean  and  standard  deviation  of  pi(t) ,  ai(i),  P2{t)  and  «2(i)  for  t  = 

1 . 88. 

The  results  of  this  experiment  are  summarized  in  Table  7.1.  Observe  that: 

•  The  Bayesian  classifier  performed  worse  in  the  adaptive  operating  mode.  Since  the 
adaptive  mode  is  essentially  a  majority  voting  scheme,  it  is  effective  only  when  the 
error  is  under  0.5  [11] 
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Table  7.1:  Estimate  of  the  mean  and  standard  deviations  of  the  testing  errors  in  the  leave- 
one-out  experiment  using  only  seismic  data  . 


Classifier 

Non-adaptive  mode 
Average  SD 

Adaptive  mode 
Average  SD 

Bayesian 

0.7891 

0.0719 

0.8587 

0.1164 

Non-hierarchical  Type-1 

0.2783 

0.2685 

0.2524 

0.2378 

Non-hierarchical  Type-2 

0.2738 

0.2680 

0.2429 

0.2184 

•  Both  the  FL-RBCs  performed  much  better  than  the  Bayesian  classifier  in  the  non- 
adaptive  operating  mode  as  well  as  in  the  adaptive  operating  mode. 

•  The  type-2  non-hierarchical  classifier  performed  marginally  better  than  its  type-1 
equivalent. 

•  When  compared  to  the  results  of  the  leave-one-out  experiments  for  classifiers  based 
on  acoustic  data  [11],  the  classifiers  based  on  seismic  data  performed  significantly 
poorer.  We  conclude  that  seismic  data  is  an  unreliable  source  for  feature  exaction  and 
classification. 


7.2  Cross-Validation  Experiment [11] 

The  training  and  testing  data  were  grouped  on  the  basis  of  prototypes.  The  prototypes  were 
randomly  divided  into  10  even  folds,  each  fold  containing  about  10%  of  prototypes.  The 
number  of  designs  implemented  were  10.  In  the  Z-th  design,  prototypes  of  the  /-th  fold  were 
set  aside  for  testing  and  the  prototypes  of  the  remaining  folds  were  used  for  training.  We 
used  CPA-based  prototypes  for  this  experiment.  FL-RBCs  using  type-1  and  type-2  fuzzy 
logic  and  the  Bayesian  classifier  were  tested  using  this  method.  An  algorithm  detailing  the 
procedure  is  given  next 

Index  the  prototypes  of  all  runs; 

Randomly  permute  the  index  of  all  prototypes; 

Divide  the  prototypes  into  10  even  folds; 
for  t  =1:10  //10  designs  in  total 
{ 

Leave  out  the  prototypes  of  the  t-th  fold  for  testing; 

Use  the  prototypes  of  the  remaining  folds  for  training; 
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Initialize  the  parameters  of  the  type-1  fuzzy  rule-based  classifier; 
for  n=l:1000  //train  and  test  for  1000  epochs 
{  Optimize  the  type-1  FL-RBC  using  training  prototypes; 

Test  the  type-1  FL-RBC  using  the  testing  prototypes  for  the  non-adaptive 
operational  mode; 

Save  the  testing  error  rate  as  e\  (t,n); 

} 

Save  the  minimum  of  e\ (t,n)  for  n=l,  ...,1000  as  Pi(t); 

Save  the  optimal  parameters  as  6i(t)-, 

Initialize  the  parameters  of  the  type-2  FL-RBC  based  on  0i(£); 
for  j=l:1000  //train  and  test  for  \00  epochs 
{  Optimize  the  type-2  FL-RBC  using  training  prototypes; 

Test  the  type-2  FL-RBC  using  the  testing  prototypes  for  the  non-adaptive 
operational  mode; 

Save  the  testing  error  rate  as  e2 (t,n); 

y 

Save  the  minimum  of  e2 (t,n)  for  n=  1, ...,  1000  as  p2(f); 

Save  the  optimal  parameters  as  02(O  ; 

> 

Compute  the  mean  and  standard  deviation  of  p^and  p2(f)  for  t—  1, 10. 

The  prototypes  in  this  experiment  were  trained  for  1000  epochs,  so  as  to  account  for  the 
relatively  smaller  number  of  training  prototypes  when  compared  to  the  leave-one-out  ex¬ 
periment.  The  results  of  this  experiment  are  summarized  in  the  first  column  of  Table  7.2. 
Observe  that: 

•  The  FL-RBCs  performed  much  better  than  the  Bayesian  classifier.  We  conclude  that 
when  non-stationary  unreliable  data  is  used  for  classification,  it  is  sub-optimal  to  model 
the  data  probabilistically. 

•  Type-2  FL-RBC  performed  marginally  better  than  their  type-1  counterparts. 

•  The  results  from  the  cross-validation  experiment  are  in  total  agreement  with  the  results 
from  the  leave-one-out  experiment. 


29 


Table  7.2:  Estimate  of  the  mean  and  standard  deviations  of  the  testing  errors  in  the  cross- 
validation  experiment  using  only  seismic  data. 


Classifier 

Seismic  data 
Average  SD 

Fused  classifier 
Average  SD 

Bayesian 

0.7248 

0.0642 

0.7315 

0.054 

Non-hierarchical  Type-1 

0.3883 

0.0688 

0.3324 

0.068 

Non-hierarchical  Type-2 

0.3738 

0.0687 

0.3629 

0.0642 

7.3  Fusion  using  the  CFI 

In  this  experiment  we  used  the  soft  outputs  of  the  classifiers  (both  Bayesian  and  FL-RBC). 
We  used  the  log-likelihoods  of  the  Bayesian  classifier  as  the  soft  outputs  to  be  fused.  In  the 
type-1  FL-RBC,  yi(x')  and  7/2  (>0  were  considered  to  be  the  soft  outputs  fused  using  the  CFI. 
The  fused  outputs  were  then  defuzzified  to  get  a  crisp  output.  In  the  type-2  FL-RBC  case, 
the  output  from  the  classifier  after  type-reduction  is  an  interval  type-2  fuzzy  set;  therefore, 
the  extended  CFI  as  described  in  (6.5)  was  used  to  fuse  the  classifier  outputs. 

This  experiment  is  similar  to  the  previous  cross  validation  experiment.  Prototypes  were 
randomly  divided  into  10  even  folds,  each  fold  containing  about  10%  of  prototypes.  We 
implemented  10  fused  classifier  designs.  In  the  Z-th  design,  the  Z-th  fold  was  set  aside  for 
testing.  The  cross  validation  experiment  (as  described  in  Section  7.2)  was  conducted  on  the 

9  remaining  folds. 

We  designed  9  non-hierarchical  FL-RBCs  using  the  9  remaining  folds.  The  r-th  fold 
(r  =  1, ...,  9)  was  left  out  for  testing  and  prototypes  from  the  remaining  8  folds  were  used  for 
training.  Training  was  done  for  1000  epochs  and  the  optimal  design  (least  error  rate)  was 
recorded;  thus,  9  such  classifiers  designs  were  stored  for  each  Z  (Z  =  1, ...,  10).  They  were  then 
fused  and  tested  on  the  Z-th  fold  which  was  originally  left  out.  Since  Z  =  1, ...,  10,  there  were 

10  fused  classifier  designs  that  emerged.  The  average  error  rate  and  the  standard  deviation 
were  recorded.  Since  9  classifier  designs  were  fused  for  each  Z (Z  =  1,...,  10),  a  total  of  90 
non-hierarchical  FL-RBCs  were  designed  for  this  experiment.  An  algorithm  detailing  the 
procedure  is  given  next: 


Index  the  prototypes  of  all  runs; 

Randomly  permute  the  index  of  all  prototypes; 
Divide  the  prototypes  into  10  even  folds; 
for  1  =1:10  //10  fused  classifier  designs  in  total 
{ 
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Leave  out  the  prototypes  of  the  1-th  fold  for  testing; 

Use  the  prototypes  of  the  remaining  folds  for  generation  classifier  designs 
to  be  fused 

for  r  =  1:9  //  9  FL-RBC  designs  for  each  1 

{ 

Leave  out  the  prototypes  of  the  r-th  fold  for  testing 
Use  the  prototypes  of  the  remaining  8  folds  for  training; 

Initialize  the  parameters  of  the  type-1  fuzzy  rule-based  classifier; 
for  n=l:1000  //train  and  test  for  1000  epochs 
{  Optimize  the  type-1  FL-RBC  using  training  prototypes; 

Test  the  type-1  FL-RBC  using  the  testing  prototypes  for  the  non-adaptive 
operational  mode; 

Save  the  testing  error  rate  as  ei(r,n); 

> 

Save  the  minimum  of  ei(r,n)  for  n=l(...,  1000  as  Pi(r); 

Save  the  optimal  parameters  as  #i(r); 

Initialize  the  parameters  of  the  type-2  FL-RBC  based  on  0i(r); 
for  j=l:1000  //train  and  test  for  400  epochs 
{  Optimize  the  type-2  FL-RBC  using  training  prototypes; 

Test  the  type-2  FL-RBC  using  the  testing  prototypes  for  the  non-adaptive 
operational  mode; 

Save  the  testing  error  rate  as  e? (t,n); 

y 

Save  the  minimum  of  e2(r,r?)  for  n  —  1,...,  1000  as  P‘2{r); 

Save  the  optimal  parameters  as  62 (r)  ; 

> 

Use  the  minimum  error  rates  pi(r)and  P2(r)  as  the  fuzzy  densities  and  0i(r) 
and  $2 (fO  as  the  classifier  parameters  for  r  =  l,...,9. 

Fuse  the  r  classifiers  and  test  on  the  l- th  fold. 

Save  the  error  rate  as  pfe\{l)  and  pfe2(l) 

} 

Compute  the  mean  and  standard  deviation  of  p/ei(?)and  pfe2{l)  for  1  =  1,. ..,10. 

The  results  of  this  experiment  can  be  found  in  the  second  column  in  Table  7.2.  Observe 
that: 
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Table  7.3:  Estimate  of  the  mean  and  standard  deviations  of  the  testing  errors  in  the  cross- 
validation  experiment  using  both  acoustic  and  seismic  data 


Classifier 

Acoustic  data 
Average  SD 

Fused  classifier 
Average  SD 

Bayesian 

0.2231 

0.0097 

0.3541 

0.082 

Non-hierarchical  Type-1 

0.1428 

0.012 

0.1697 

0.106 

Non-hierarchical  Type-2 

0.1373 

0.011 

0.1642 

0.12 

•  The  performance  of  the  Bayesian  classifier  is  quite  bad.  Implementing  the  CFI  as  a 
fusion  algorithm  did  not  help  improve  its  performance. 

•  There  was  only  a  slight  improvement  in  performance  when  the  CFI  was  used  to  fuse 
the  classifiers;  however,  since  the  variance  of  the  error  rate  of  the  classifiers  is  already 
so  low,  this  is  an  expected  result. 

•  We  conclude  that  classification  based  on  seismic  data  alone  does  not  yield  good  per¬ 
formance.  The  inherent  unreliability  of  the  data,  and  consequently  the  features,  is 
severe. 

The  same  experiment  was  repeated  for  just  the  acoustic  data  and  10  fused  classifier  designs 
were  recorded.  Since  the  l- th  fold  was  left  out  in  the  /- th  fused  classifier  design,  the  classifiers 
based  on  the  acoustic  and  seismic  data  can  be  fused  and  tested  on  the  /-th  fold.  The  first 
column  of  Table  7.3  shows  the  performance  of  the  different  classifier  designs  for  the  cross- 
validation  experiment  using  only  the  acoustic  data,  while  the  second  column  of  the  same 
table  shows  the  results  of  this  experiment  for  a  fused  classifier  that  is  the  combination  of 
classifiers  based  on  both  acoustic  and  seismic  data.  Observe  that: 

•  Acoustic  data  is  a  more  reliable  data  source  for  classification  than  acoustic  plus  seismic 
data. 

•  The  performance  of  the  classifiers  based  on  acoustic  data  was  reported  in  [11]  and 
is  much  better  than  classifiers  based  on  seismic  data  for  all  operating  modes  and 
algorithms. 

•  Because  of  the  large  difference  in  performance  of  the  two  kinds  of  classifiers,  the  fused 
classifier  did  not  improve  performance.  We  conclude  that  using  seismic  data  for  multi¬ 
category  classification  is  worse  than  just  using  acoustic  data. 


32 


Chapter  8 
Conclusions 


We  have  developed  binary  and  multi-category  classifiers  based  on  seismic  data  to  classify 
heavy-tracked,  light-tracked,  heavy-wheeled  and  light-wheeled  ground  vehicles  and  base  lined 
their  performance  against  a  Bayesian  classifier.  We  focused  on  data  collected  in  the  normal 
terrain. 

We  also  developed  fusion  algorithms  for  type-1  and  type-2  Fuzzy  logic  Rule-Based  Clas¬ 
sifiers  (FL-RBCs)  based  on  the  Choquet  Fuzzy  Integral  (CFI).  We  conducted  leave-one-out 
and  cross  validation  experiments  to  evaluate  the  performance  of  the  classifiers  and  to  eval¬ 
uate  the  effectiveness  of  seismic  data  for  classification.  We  also  conducted  modified  cross 
validation  experiments  experiments  to  evaluate  the  performances  of  fused  classifiers  (both 
seismic  and  acoustic)  and  determine  if  performance  could  be  improved. 

Our  results  show  that  seismic  data  can  be  very  effective  for  binary  classification  when 
energy  is  used  as  a  feature.  When  multi-category  classification  was  considered,  the  perfor¬ 
mance  of  the  classifiers  based  on  seismic  data  was  poor  when  compared  to  the  performance 
of  the  classifiers  based  on  acoustic  data.  The  performance  gain  was  only  marginal  when 
the  classifiers  were  tested  in  the  adaptive  mode,  in  the  leave-one  out  experiment.  Fusing 
the  different  classifier  designs  using  only  seismic  data  also  did  not  show  any  appreciable 
improvement  in  performance.  The  performance  actually  deteriorated  when  classifiers  using 
only  seismic  data  and  those  using  only  acoustic  data  were  fused.  We  conclude,  therefore, 
that,  seismic  data  is  a  poor  choice  for  multi-category  classification  of  ground  vehicles. 

If  seismic  data  are  available,  a  two-stage  classification  process  may  be  the  optimal  ap¬ 
proach,  where  we  first  classify  tracked  and  wheeled  vehicles  using  seismic  data  and  then 
classify  heavy-tracked  and  light-tracked  (heavy-wheeled  and  light-wheeled)  using  acoustic 
data. 

We  note  that  FL-RBCs  performed  better  than  the  Bayesian  for  all  the  experiments 
conducted.  This  shows  that  FL-RBCs  are  better  suited  to  handle  uncertainties  in  the  data. 
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Appendix  A 

The  Choquet  and  Sugeno  Fuzzy 
Integrals:  A  Tutorial 

A.l  Introduction 

The  fuzzy  integral  is  a  non-linear  approach  to  combine  multiple  sources  of  uncertain  informa¬ 
tion  (e.g.,  in  pattern  recognition  applications,  where  results  from  multiple  classifiers  will  be 
combined).  The  function  being  integrated  provides  a  confidence  value  for  each  information 
source  for  a  particular  hypothesis,  and  the  integral  is  evaluated  over  the  set  of  information 
sources.  The  Choquet  (and  Sugeno)  fuzzy  integral  is  a  specific  type  of  fuzzy  integral  which 
combines  information  from  multiple  sources  by  taking  into  account  subjective  evaluation  of 
the  worth  of  each  of  the  sources. 


A. 2  Fuzzy  Measures 

The  fuzzy  integral  relies  on  the  concept  of  a  fuzzy  measure  which  in  turn  is  a  generalization 
of  the  concept  of  a  probability  measure.  Consider  a  finite  universal  set  X  —  {aq, . . . ,  xn} 
which  can  be  interpreted  in  a  number  of  ways,  e.g. 

•  X  is  a  set  of  expert  judgments  concerned  with  decision  making. 

•  X  is  a  set  of  attributes  or  features.  Each  element  of  X  is  used  to  calculate  a  degree  of 
membership  for  an  object  u  eU  with  respect  to  a  class  w  €  fi. 

•  X  is  a  set  of  classifier  outputs.  This  is  different  from  the  previous  interpretation  in 
that  the  classifier  outputs  can  be  represented  as  confidence  levels  for  associating  an 
object  with  a  particular  class. 
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(A.l) 


Let  P(X)  be  the  power  set  of  X.  A  fuzzy  measure  over  the  set  X  is  the  set  function 

9  :  P{X)  — >  [0, 1] 

such  that 

1.  5(0)  =  0,  g(X)  =  1 

2.  If  A,  B  C  P(X)  and  Ac  B,  then  g(A)  <  g(B). 

Usually  g(A)  is  viewed  as  the  importance  or  power  of  an  individual  source  or  subset  of 
sources  ( A )  within  the  set  X. 

Another  way  of  interpreting  fuzzy  measures  is  by  considering  the  effect  of  the  contribution 
of  an  element  to  a  union  or  subset  of  elements.  The  contribution  or  added  value  of  element 
Xi  in  union  A  is  defined  by  g(A  U  Xi)  —  g{A). 

According  to  Sugeno  [8],  a  fuzzy  measure  g  {A  U  B),  which  specifies  the  importance  of 
the  union  of  disjoint  subsets  A  and  B,  cannot  be  completely  ascertained  from  the  component 
measures  g(A)  and  g(B).  Consequently,  he  introduced  A  -  fuzzy  measures  (also  called  “Sugeno 
measures”)  which  satisfy  the  additional  property,  that: 

gx(A  U  B)  =  g\(A)  +  g\(B)  +  Xgx(A)gx(B)  ,  A  >  -1  (A.2) 

for  all  A,  B  C  X  with  A  n  B  —  0.  Sugeno  fuzzy  measures  are  typically  denoted  as  gy, 
however,  since  they  have  found  widespread  use  in  applications  (especially  those  involving 
fuzzy  integrals),  it  has  become  common  practice  to  denote  them  just  as  g.  In  this  report,  g 
and  gx  are  used  interchangeably. 

The  following  are  important  properties  of  fuzzy  measures: 

1.  If  A  =  0,  then  the  fuzzy  measure  gx  becomes  a  probability  measure  in  that  g  {A  U  B)  = 
g(A)  +  g{B).  If  A  <  0,  then  gx  shows  sub-additivity  in  that  g  (A  U  B)  <  g(A)  +  g(B) 
and  if  A  >  0,  then  gx  shows  super-additivity  in  that  g  (A  U  B)  >  g(A)  +  g(B). 

2.  Let  AT  be  a  finite  set  of  information  sources  X  —  {xi,. xn}  and  let  g\  —  gx({Xi}). 
The  values  g\,g\, ... ,  </”  are  called  fuzzy  densities 1  and  represent  the  importance  of  the 
individual  information  sources. 

3.  Let 

A,.  =  {xu  ■■■ ,  C  X  (A.3) 

1They  are  called  “fuzzy”  because  ,  <7"  are  the  values  of  the  membership  function  of  the  fuzzy  set 

g  defined  on  X. 
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We  can  then  form  a  sequence  of  nested  sets  Ax ,  ... ,  An,  starting  from  Ax  =  {aq},  and 
subsequently  adding  in  elements  x 2,  ...,xn,  one  at  a  time  (note  that  An  =  X  and 
A0  —  0).  The  measure  g{Ai)  is  calculated  from  the  following  recursive  formula  which 
can  be  derived  from  (A.2)  (see  Section  B.l): 

g{Aij  =g(Ai-1U{xi})  =  gt+g(Ai-1)  +  \gig(Ai-.1)  for  1  <  i  <  n  (A.4) 


where  g(Ai)  =  g 1  and  g{An)  =  g{X). 


4.  Given  the  fuzzy  densities  for  a  set  of  sources  X,  it  is  important  to  determine  the 
measures  of  the  elements  of  the  power  set  P(X).  This  is  essential  in  many  applications 
and,  as  we  explain  next,  can  be  done  by  using  the  A  -  fuzzy  measure.  Let  Ai  = 
{aq,  ... ,  Xj}  C  X.  According  to  (A.4)  we  can  write  (see  Section  B.2) 


g(An) 


j— 1  j= 1 k=j+ 1 


9(An) 


n  (! + - 1 

.r.i&A 


A^O 


(A.5) 

(A.6) 


The  value  of  A  can  then  be  found  by  solving  the  equation 


g(K)=9(X)  =  \  (A.7) 

From  (A.6)  and  (A.7),  this  is  equivalent  to  solving  the  following  equation  for  A: 

n 

A  +  1  =  JJ  (l  +  A g')  (A.8) 

i=l 

Hence,  if  we  know  the  fuzzy  densities  g\  i  =  1,  •  •  •  ,  n,  we  can  construct  the  A  -  fuzzy 
measure.  We  first  solve  (A.8)  for  A,  and  then  compute  the  g{Ai)’s  using  (A.4). 

5.  For  a  fixed  set  of  densities  {(71} ,  0  <  gl  <  1,  there  exists  a  unique  A  6  (—1, 00)  where 
A  7^  0  which  satisfies  (A.8). 

6.  Let  Ai  be  defined  according  to  (A. 3).  For  densities  {^} ,  0  <  gi  <  1,  we  have  (see 
Section  B.3) 

0  <  g(Ai)  <  1  Vi  (A.9) 

with  equality  when  i  —  0  and  i  =  n  ,  i.e.,  g(A0)  —  0  and  g(An)  =  1. 
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Estimating  the  individual  fuzzy  densities,  ... ,  gn},  is  an  important  problem  in  all 

applications.  The  behavior  of  fuzzy  integrals  (both  the  Choquet  and  the  Sugeno)  is  heav¬ 
ily  dependent  on  the  choice  of  these  fuzzy  densities.  In  some  applications  it  is  possible 
to  estimate  these  densities  from  training  data  [10].  For  example,  in  a  pattern  recognition 
application  where  the  output  of  different  classifiers  are  fused,  the  densities  could  be  the  per¬ 
formance  of  the  individual  classifiers.  Liang  et  al.  [6]  used  a  genetic  algorithm  to  determine 
the  fuzzy  densities  from  training  data. 

A. 3  The  Choquet  Fuzzy  Integral 

Let  h  be  a  measurable  function 

h  :  X  ->  [0, 1]  (A.  10) 

The  Choquet  fuzzy  integral  (CFI)  defined  below  is  the  integral  of  h  with  respect  to  a  fuzzy 
measure  g\.  Note  that  in  (A.10),  X  could  be  a  set  of  classifier  outputs  and  h(x)  could  be 
the  soft  output  of  the  classifier  (the  confidence  or  evidence  grade  of  the  classifier)  denoting 
that  an  input  sample  is  from  a  particular  class.  In  general,  X  =  {aq, . . .  ,xn}  is  a  set  of 
information  sources  and  h(Xi )  is  the  confidence  grade  of  source  i  that  a  particular  hypothesis 
is  true.  A  A-fuzzy  measure  provides  the  importance  of  each  subset  of  sources  X  for  this 
hypothesis  evaluation. 

To  begin,  we  provide  a  definition  of  a  fuzzy  integral.  Given  a  class  of  functions  F  C 
{h  :  X  —>  R}  and  a  class  of  fuzzy  measures  m  C  M,  a  functional 

I :  F  xm  —>  R  (A.ll) 

is  a  fuzzy  integral  [2].  Consider  a  specific  function  h  associated  with  fuzzy  density  g\.  Then, 
we  can  define  a  fuzzy  integral  as 

h,g  — >  I(h,g)  (A.12) 

There  are  a  number  of  families  of  fuzzy  integrals  in  terms  of  the  underlying  fuzzy  mea¬ 
sures  that  have  been  described  in  the  literature.  We  are  particularly  interested  in  the  Choquet 
Fuzzy  Integral  ( CFI)  which  is  a  nonlinear  functional  defined  over  measurable  sets  that  com¬ 
bines  multiple  sources  of  uncertain  information.  It  provides  a  computational  scheme  for 
aggregating  information. 

The  CFI  of  h  over  X  with  respect  to  a  fuzzy  measure  g\  is  defined  as 

Ey{h)  =  /  hog  =  J2  g{Ai)  [h  (a;*)  -  h  (®i+i)]  (A.13) 

i=l 
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where  h(x i)  >  h{x 2)  >  •••  >  /i(x„)  and  h(xn+ 1)  =  0.  Set  A  is  as  defined  in  (A.3);  i.e. 
Ai  =  {xi,  ... ,  Xj}  C  X ,  g(An)  =  g(X),  and  g(A0)  =  0.  The  CFI  can  also  be  expressed  as 
(see  Section  B.4) 

Eg(h)  —  f  hog  =  '*rh(xi)[g(Ai)-g(Ai-1)\  (A.14) 

Jx  i=i 

If  the  function  h  is  reordered  such  that  h(x  1)  <  h(x2)  <  •  •  •  <  h(xn),  then  the  CFI  has  the 
following  form  (see  Section  B.5)  : 

Eo(h)  =  f  ho  g  =  'jrg(X  -  {Ai-i})  [h  (xi)  -  h  (Xi_i)]  (A.15) 

Jx  i= i 

n 

=  (Xi,xi+i,  ...,xn)[h{xi)  -  h  (Xi_i)]  (A.16) 

«=1 

All  three  forms  of  the  CFI  are  identical.  See  Sections  B.4  and  B.5  to  see  how  one  form  leads 
to  another. 

In  comparison  with  probability  theory,  the  CFI  corresponds  to  the  concept  of  expectation, 
and  it  has  found  extensive  use  in  combining  feature  and  algorithm  confidence  values  [4]. 
Important  properties  of  the  CFI  are  (See  Appendix  B  for  their  proofs): 

1.  The  CFI  is  a  monotonically  increasing  function  with  respect  to  h(x)  [2]. 

2.  For  all  h,  g  €  [0, 1],  the  range  of  the  CFI  is 


hmin  —  Eg{h^)  —  ^max 


(A.  17) 


where  hm]n  =  min  (h(xi),/i(x2), . . . ,  h(xn))  and  hmax  =  max(/i(xi),/i(x2), . . . ,  h(x„)). 
The  CFI  attains  its  lower  bound  when  g\  =  0  for  all  i  and  it  attains  its  upper  bound 
when  g\  =  1  for  all  i. 

3.  If  h(xi)  =  c  for  all  i,  where  0  <  c  <  1,  then 


(A- 18) 
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5.  If  B  e  X,  C  e  X  and  B  C  <7,  then 


[  hog  <  f  hog  (A.20) 

JbJc 

6.  If  the  A-fuzzy  measure  g\  is  a  probability  measure,  i.e.  9\  =  1  and  A  =  0,  the  CFI 
becomes  a  weighted  average.  In  the  special  case  where  all  the  fuzzy  density  values 
are  equal,  the  CFI  is  equivalent  to  the  arithmetic  mean.  This  corresponds  to  the  case 
where  g\  —  1/n. 

7.  If  g{  =  0  for  some  j,  then 

Eg(h)  —  f  hog=  £  h{xi)[g{Ai)~  g(Ai.  j)]  (A.21) 

Jx 

This  property  shows  that  the  CFI  values  are  determined  only  by  the  input  sources  that 
have  non-zero  densities. 


A. 4  Generic  Applications  of  the  Choquet  Fuzzy  Integral 

The  CFI  aggregates  the  elements  of  the  source  information  set  X  according  to  a  specified 
criterion,  while  incorporating  the  relative  importance  of  each  of  the  elements.  In  this  sec¬ 
tion,  we  present  the  CFI  for  some  simple  “textbook”  applications  so  as  to  foster  a  better 
understanding  of  the  integral  and  gain  insight  into  why  it  works  in  aggregation. 

A.4.1  Worker  Productivity 

Consider  the  example  of  productivity2  in  a  workshop.  Let  X  =  {xi,...,xn}  be  a  set  of 
workers.  Suppose  that  each  worker  xt  works  h(Xi )  hours  a  day  from  the  opening  hour. 
Without  loss  of  generality,  the  function  that  defines  the  number  of  work  hours  for  each 
worker  is  ordered  such  that  h{x{)  <  h{xi)  <  •••  <  h{xn),  where  worker  xi  works  the  least 
amount  of  time  and  worker  xn  works  the  most;  thus,  for  %  >  2,  h(x.j)  —  h(Xi-i)  >  0. 

The  fuzzy  measure  is  defined  as  the  number  of  products  made  by  the  workers  in  one 
hour,  with  the  implicit  assumption  that  the  productivity  of  the  wrorkers  remains  constant 
throughout  the  day;  hence,  g(Xi)  denotes  the  number  of  products  made  by  worker  Xi  in  one 
hour,  and  g  is  a  measure  of  productivity.  A  group  of  workers  A  C  X  produces  the  amount 
g{A)  in  one  hour.  A  product  can  be  made  either  by  one  worker  or  by  a  group  of  workers. 

2This  example  has  been  paraphrased  from  [8]. 
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Hence,  the  number  of  products  produced  by  2  or  more  workers  working  together  is  larger 
than  the  sum  of  the  products  produced  by  each  individual  worker,  if  he  were  working  alone. 

Next  we  show  that  the  CFI  can  be  used  to  find  the  total  number  of  products  produced 
by  all  the  workers  in  one  work  day.  The  working  hours  of  all  the  workers  are  aggregated  in 
the  following  way.  First,  the  whole  group  X  with  n  workers  works  h(x  1)  hours.  Next,  the 
group  X  -  {.Ti}  =  {x2,Xz,  •  ■  -  ,xn}  works  h(x2)  -  h(xi)  hours  as  the  worker  X\  is  no  longer 
at  work.  Then,  the  group  X  -  {x1,x2}  -  {x3,Xi, . . .  ,xn}  works  h(x2)  -  h(x2)  hours,  and 
so  on.  Lastly,  one  worker  xn  works  for  h(xn)  —  h(xn- 1)  hours.  Since  group  A  produces  the 
amount  g(A)  in  one  hour,  the  total  number  of  products  produced  by  all  workers  in  one  day 
can  be  expressed  as: 


KxMx) 


+  [h(x 2)  -  h(x  1)]  g(X  -  {®i}) 

+  [h(x3)-h(x2)]g(X -{xi,x2})  +  --- 
+  [h(xn)  -  h(xn- 1)]  g  ({z„}) 

n 

=  ^2  [h  (Xi)  -  h  (Xi_x)]  g  ({xu  xi+u arn})  where  h(x0)  =  0 

i= 1 

n 

=  ^T[h(xi)  -  h(xi-i)]g(X  -  Ai-i)  where =  {tCi,  ...,  Xt}  and AQ  =  {0} 

7=1 

=  Eg(h)  (A.22) 


This  example  shows  that  (A.22)  fits  the  definition  of  the  CFI  in  (A. 15),  and  demonstrates 
the  aggregation  logic  behind  the  CFI. 


A. 4. 2  A  Collection  of  Rare  Books 

Consider  a  particularly  rare  book3  that  comes  in  two  volumes.  The  first  and  second  volumes 
are  denoted  by  Xi  and  x2,  respectively.  The  fuzzy  measure  is  defined  as  the  price  of  the  two 
volumes.  The  price  of  the  first  volume  is  given  by  g  ({.Ti}),  the  price  of  the  second  volume 
by  g  ({x2})  and  the  price  of  the  complete  set  by  g  ({xi,^})-  The  complete  set  is  considered 
to  be  more  valuable  than  the  combination  of  the  two  volumes;  hence,  this  fuzzy  measure 
becomes  a  A— fuzzy  measure  since 

0({®i>®2})  >3({£i})+£({x2})  (A.23) 

3This  example  has  been  paraphrased  from  [8]. 
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A  certain  person  sells  h  (a'i )  copies  of  the  first  volume  and  h  (.r2)  copies  of  the  second  volume. 
Without  loss  of  generality  we  can  assume  that  h  (xi)  <  h  (a’2).  The  number  of  complete  book 
sets  (both  volumes)  sold  is  h(x  1).  The  number  of  copies  of  the  second  volume  sold  separately 
is  h(x 2)  —  h(x  1).  The  total  amount  of  money  the  seller  gets  is 

h,  (xi)  g  ({a?!,  x2})  +  [h  ( x2 )  -  h  (zi)]  h  (»2)  (A.24) 

This  expression  is  also  similar  to  (A.  15)  and  is  another  example  of  combining  measurable 
functions  with  respect  to  densities  using  the  CFI. 

A. 4.3  Multiple  Judges  of  a  Sporting  Event 

A  numerical  example  that  demonstrates  the  calculations  of  the  CFI,  is  presented  next;  it  is 
adapted  from  [5]. 


Table  7.3:  Scores  for  the  participant  u  from  the  five  judges 


Unordered 

Ordered 

Judge  ( Xi ) 

Score  ( h  (Xi)) 

Expertise  (g  (xt)) 

Judge  {x'i) 

Score  ( h  (x'J) 

Expertise  (g  ( x[ )) 

1 

0.8 

2 

2 

4 

0.7 

3 

0.2 

0.4 

5 

0.6 

0.7 

4 

0.6 

0.7 

1 

0.5 

0.8 

5 

0.6 

0.7 

3 

0.2 

0.4 

Let  the  set  X  =  (a'i, . . . ,  £5}  represent  five  judges  at  a  sporting  event.  Assume  that  the 
participant  u  has  obtained  the  scores  shown  in  Table  7.3  from  the  n  =  5  judges.  Experts  rate 
the  judges’  expertise,  and  their  ratings  are  also  shown  in  the  table.  A  rating  of  1  indicates 
that  the  judge  is  an  expert  while  a  rating  of  zero  indicates  a  totally  unknowledgeable  judge 
(naturally  a  judge  with  a  rating  of  zero  would  not  be  considered  for  aggregation).  The 
expertise  of  the  judges  can  be  considered  to  be  the  densities  g\  of  the  A— fuzzy  measure.  The 
judges’  scores  need  to  be  aggregated  so  as  to  determine  the  final  score  for  the  participant; 
hence,  the  scores  become  the  values  of  the  function  h  aggregated  in  the  CFI. 

Given  the  fuzzy  densities  gi  of  the  set  X,  A  has  to  be  computed  using  (A.8).  Solving 
this  equation  with  the  density  set  [0.8, 0.5, 0.4, 0.7, 0.7],  we  get  a  unique  root  >  —1  which  is 
A  =  —0.9943.  Next,  the  fuzzy  measures  of  the  power  set  can  be  determined  by  using  (A.4). 

The  aggregate  score  is  computed  using  the  CFI  by  first  ordering  the  scores  and  the 
corresponding  densities  such  that  the  scores  are  in  decreasing  order.  Prior  to  ordering,  the 


41 


scores  are  assigned  as  h(x i)  =  0.5,  h(x 2)  =  0.7,  ... ,  h(x5)  =  0.6.  Let  X'  be  the  ordered  set 
so  that  h(x i)  >  h(x'2)  >  •  •  •  >  h(x'n).  Then, 

[0.5, 0.7, 0.2, 0.6, 0.6]  -»  [0.7, 0.6, 0.6, 0.5, 0.2]  (scores)  (A.25) 

[0.8, 0.5, 0.4, 0.7, 0.7]  -*  [0.5, 0.7, 0.7, 0.8, 0.4]  (densities)  (A.26) 

After  ordering,  the  scores  are  assigned  (see  Table  7.3)  as  h(x[)  =  0.7,  h(x'2)  =  0.6,  •  •  •  ,  h( x's)  = 
0.2  and  the  corresponding  densities  are  g  (a^)  =  0.5,  g  (x2)  =  0.7 , ... ,  g  (x'5)  =  0.4.  The  new 
arrangement  of  the  judges  (set  X')  according  to  the  re-ordering,  is  X'  —  {x[,  x'2,  x's,  x4,x'n}  = 
{x2,x4.X5,Xi,X3}.  Let  Ai  =  {a;j,  ... ,  x[}.  The  fuzzy  measures  for  all  A,:,  i  =  1  are 

computed  recursively  according  to  (A.4),  as: 


g(Ax)  =  0.5 

g(A2)  =  0.7  +  0.5  -  0.9943(0.7)0.5  =  0.8520 
g(A3)  =  0.7  +  0.8520  -  0.9943(0.7)0.8520  =  0.9590  (A.27) 

g(A4)  =  0.8  +  0.9590  -  0.9943(0.8)0.9590  =  0.9962 
g(A5)  =  0.4  +  0.9962  -  0.9943(0.4)0.9962  =  1.0 

Since  n  =  5,  g(AB)  =  g(X)  =  1.0.  The  CFI  can  now  be  computed  using  (A. 13)  as: 


Eg(h)  =  (0.7  —  0.6)0. 5 
+(0.6  -  0.6)0.8520 

+(0.6  -  0.5)0.9590  ,  s 

)  '  (A.  28) 

+(0.5  -  0.2)0.9962  v  ’ 

+(0.2 -0)1.0 

=  0.64 

According  to  the  CFI,  the  aggregated  score  of  participant  u  is  0.64. 

In  this  example  we  cannot  use  the  arithmetic  average  as  a  tool  for  aggregation  of  the 
judges’  scores  because  the  sum  of  the  densities  X)Li  9  O'1'*)  =  3.10  >  1.  In  order  to  compute 
the  weighted  average,  we  have  to  normalize  the  densities  as 


Wi  = 


9  fa) 

EL  9  fa) 


(A.  29) 


The  scores  and  the  normalized  weights  are  given  in  Table  7.3.  The  weighted  average  of  the 
judges’  scores  can  now  be  computed  as 
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Table  7.3:  Scores  for  participant  u  from  the  five  judges  and  their  corresponding  weights 


Judge  ( Xi ) 

Score  ( h  (a:,-)) 

Normalized  Expertise  (it/;) 

1 

0.5 

0.26 

2 

0.7 

0.16 

3 

0.2 

0.13 

4 

0.6 

0.225 

5 

0.6 

0.225 

5 

Wg(h)  =  J2h  fa) Wi  =  °-5(°-26)  +  0.7(0.16)  +  0.2(0.13)  +  0.6(0.225)  +  0.6(0.225)  -  0.538 
1=1 

(A.30) 

If  the  normalized  weights  were  to  be  considered  as  a  A-fuzzy  measure,  then  A  =  0  and  hence 
the  CFI  reduces  to  the  weighted  average  (see  Property  6  in  Chapter  A.3). 

The  aggregated  result  of  the  CFI  is  higher  than  that  of  the  weighted  average  because  it 
takes  into  account  the  increased  value  or  in  this  case  the  increased  expertise  of  two  or  more 
judges  who  agree  on  a  particular  score.  This  has  occurred  because  of  the  computation  of  the 
A-fuzzy  densities  for  subsets  of  the  set  of  judges  X. 

A. 5  The  Sugeno  Fuzzy  Integral 

Let  h  be  a  measurable  function  h  :  X  — *  [0, 1],  one  that  is  ordered  such  that  h(x i)  >  /?.(x2)  > 

•  •  •  >  h(xn).  Let  At  =  {rci,  ... ,  Xi}  C  X  where  X  =  {xi, xn},  a  finite  set.  The  A^s  form 
a  sequence  of  nested  sets  Ai,  ... ,  An,  starting  from  A\  —  {a,’i},  and  then  subsequently  adding 
elements  x 2  to  xn,  one  at  a  time  to  get  An  =  X,  as  mentioned  earlier  in  property  3  in 
Chapter  A.2.  Let  gl  —  gx({xi})  be  the  fuzzy  densities  of  the  set  X.  The  Sugeno  fuzzy 
integral  (SFI)  with  respect  to  a  fuzzy  measure  g  is  given  by  [9]: 

Fo(h)  =  Sxho9  =  supQ.  {*(«, <7 (#«))}  (A. 31) 

where  Ha  is  the  a-cut  of  h  and  t  is  a  f-norm. 

The  a-cut  of  a  fuzzy  set  A  on  U  is  the  set  Aa  =  {u\u  E  U,  /j,a{u)  >  a}  where  /x.4(u)  is  the 
membership  function  of  A.  In  our  case  the  a-cut  of  h  is  the  set  Ha  =  {u\  u  e  X,  h(u )  >  a}. 
In  Sugeno’s  original  formula,  the  t-norm  used  was  the  minimum.  In  order  to  use  (A. 31), 
the  t-norm  should  follow  the  distributive  law  under  the  supremum  operator  [4],  Examples 
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h{x) 


Figure  A.l:  Plot  of  h{x)  versus  x.  Note  that  the  a-cut  shown  in  this  figure  is  the  set  A*. 
This  is  also  the  h{Xi)th- cut.  The  continuous  curve  is  for  purposes  of  illustration  only,  i.e.,  it 
actually  consists  of  lines  connecting  discrete  values  of  h(Xi)  for  x  =  {a?i , xn}- 


of  such  t-norms  are  minimum,  product,  bounded  difference,  and  drastic  product. 

Since  X  is  finite,  h  has  at  most  n  different  cv-cuts  ranging  from  H0  —  X  to  Hhei;,ht(H)- 
The  latter  only  contains  the  element  (s)  that  reach  the  maximum  level  of  the  function  h. 
Since  h(x i)  >  h(x 2)  >  •  •  •  >  h(xn )  and  A,-  =  {rrj , Xj}  C  X,  each  Aj  C  X  is  the  h(xj)- cut 
of  h;  thus,  (A.31)  can  be  expressed  as 


Fg(h)  =  /  hog=  max  {t  {h{xj),g(Aj))} 

Jx  J=1 . n 


(A.32) 


which  is  computationally  simpler  than  (A.31).  In  Fig  A.l  the  h(xi)th- cut  of  h  is  shown.  This 
is  the  set  {xi,  ... ,  Xi)  which  we  have  defined  to  be  Aj.  In  applications  of  the  SFI  to  pattern 
recognition  the  t-norm  most  commonly  used  is  the  minimum;  hence,  the  most  common  form 
of  the  SFI  is  given  by 

m)  = 

where  the  values  of  <?(Aj)  are  determined  recursively  according  to  (A.4).  Figure  A. 2  provides 
a  graphical  interpretation  of  the  SFI,  which  attempts  to  find  the  best  consensus  between  the 
function  value  and  the  importance  attribute.  This  is  where  the  SFI  fundamentally  differs 
from  the  CFI.  While  the  CFI  quantitatively  weights  the  function  by  the  jump  in  the  fuzzy 


Jxh°g  =  \!  [h(Xj)  A  g  (Aj)] 


(A.33) 
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Figure  A.2:  Graphical  illustration  of  the  calculation  of  the  SFI 

measure,  the  SFI  associates  each  function  value  with  the  corresponding  importance  measure. 

Consider  the  worker-productivity  problem  in  Section  4.A.  The  SFI  finds  the  best  con¬ 
sensus  between  the  number  of  work  hours  and  productivity  whereas  the  CFI  computes  the 
improvement  in  productivity  weighted  by  the  number  of  hours.  Clearly,  the  CFI  makes  more 
heuristic  sense  in  this  quantitative  framework. 

Next,  consider  the  numerical  example  in  Section  4.C  where  judges  scored  participants 
at  a  sporting  event.  Recall  that  participant  u  has  obtained  the  scores  shown  in  Table  7.3 
from  the  5  judges;  densities  g\  of  the  A— fuzzy  measure  are  given  in  (A.27),  and  the  fuzzy 
measures  denote  the  expertise  of  the  judges.  The  scores  obtained  for  participant  u  can 
now  be  aggregated  as  follows  using  the  SFI  so  as  to  produce  a  final  score.  The  scores  are 
again  reordered  so  that  h(x[)  >  h(x'2)  >  •••  >  h(x'n)  where  X'  =  {x^.x^.x^x^  x'5}  = 
{x2,  X4.X5,  X\i  £3}  is  the  reordered  set  of  judges  as  in  (A.25),  and  the  corresponding  densities 
are  also  reordered  as  in  (A. 26).  Using  the  definition  of  the  SFI  in  (A. 33)  and  the  computed 
values  of  the  g{A{ )  in  (A.27)  we  find: 
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Fg(h)  =  max  [min  (0.7, 0.5) ,  min  (0.6, 0.8520) ,  •  •  • 

min  (0.6, 0.9590) ,  min  (0.5, 0.9962) ,  min  (0.2, 1)] 

=  max  [0.5, 0.6, 0.6, 0.5, 0.2] 

=  0.6 

We  see  that  the  aggregated  value  of  the  scores  Fg(h),  according  to  the  stated  fuzzy  measure 
g  using  the  SFI  is  0.6.  The  SFI  computes  the  highest  score  that  most  judges  agreed  upon 
corresponding  to  their  expertise. 

In  general,  the  CFI  is  used  when  the  problem  framework  involves  quantitative  measures 
and  the  SFI  is  used  when  qualitative  measures  (e.g.,  expertise,  accuracy,  talent  etc)  define 
the  importance  attribute.  An  application  where  quantitative  measures  are  the  densities,  and 
hence  the  CFI  is  used,  is  described  in  Section  6.C. 

The  Generalized  Sugeno  Fuzzy  Integral  (GSFI)  is  an  extended  version  of  the  SFI  that  is 
formed  when  each  h[Xi)  is  not  a  real  value  in  [0, 1]  but  is  a  fuzzy  number  within  the  universal 
set  [0, 1]  and  was  presented  by  Auephanwiriyakul,  et  al.  [1].  In  the  case  of  the  numerical 
example  just  presented,  if  the  judges  had  scored  the  participants  qualitatively  (e.g.,  good, 
excellent),  then  the  scores  would  themselves  be  fuzzy.  In  this  case  the  GSFI  would  be  used 
to  aggregate  the  8scores.  The  output  of  the  GSFI  is  a  type-1  fuzzy  set. 


Appendix  B 

Derivation  of  Some  of  the  Properties  of 
Fuzzy  Measures 


B.l  Derivation  of  (A. 4) 

We  know  from  (A.2)  and  (A.3)  that  for  a  A-fuzzy  measure 

s(M)  =  0({®i})  =  91  (B.l) 

9  (Aa)  =  9  ({*i,  x2})  -  g  ({xj}  U  {x2})  =91+92  +  A glg*  (B.2) 

9  (As)  =g({x i,x2,x3})  =g({x i,x2}U{x3»  =  9  (A2)  +  gA  +  Xg3g  (A2)  (B.3) 

From  (B.l),  (B.2),  and  (B.3)  and  extrapolating  to  n,  we  find 

g  ( A„ )  —  g  (A„_i  U  {®*})  —  gn  +  g  (A„_i)  +  A gng  (An_i)  (B.4) 

Note  that  g  (A„)  =  g  ( X )  =  1,  and  g  (A0)  =  0  =  0  (by  definition);  therefore,  we  can  state  in 
general  that 

g  (. At )  =  g  (A_i  U  {x*})  =  g*  +  g  (4-i)  +  A glg  (A.^)  for  1  <  i  <  n  (B.5) 
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B.2  Derivation  of  (A. 6) 

Eqn  (B.2)  can  be  written  as 

2 

9  (40  =  9j  +  Wg2  (B.6) 

3= 1 

Multiplying  and  dividing  by  A  and  then  adding  and  subtracting  1  we  get 

9  (40  =  ( [l  +  A g1  +  A g2  +  A 2gxg2  -  l]  (B.7) 

=  (x)  [(!+V)(l  + V)-!]  (B.8) 

=  (l)  n(!  +  V)-l  (B.9) 

Similarly,  for  g(A3),  substituting  (B.9)  in  (B.3),  we  see  that 

0(4O=^j  I|(l  +  V)-1  +  II(1  +  ^1)”1^  (B.10) 


- na+vj-ija+v)^  (b-h) 

=  (x)  n(!  +  ^9l)  ~  (l  +  A#3)  +  gA  (B-12) 

=  (t)  IK1  +  V)-1  (B.13) 

'  '  Li=l 

For  n  —  k  ,  we  assume  that 

<?(A:)=(j)  n(i+vj-i  (bi4) 

Now,  for  n  =  A:  +  1,  we  have  (according  to  (B.5)) 

9  (4r+i)  =  gk+1  +  g  (40  +  A  gk+1g  ( 4 )  (B.15) 
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=(i)  iki+v)-i  +  + Agfc+i | 

'  '  ,i=l 

(G)|n  (■*«-■]) 

(B.  16) 

■(©[nctvi-.j; 

)  (1+A gk+1)+gk+1 

(B.17) 

=  (i)[ff('+V)-l|-(; 

'  '  L*=l  J  ' 

/ 1  \  rfc+i 

^  (1  +  A gk+1)+gk+1 

1 

(B.18) 

"  w[S(1+V)"1. 

Prom  (B.9)  ,  (B.14),  (B.19)  and  the  induction  hypothesis,  we  conclude  that 

(B.19) 

n 

<?(A„)  =  (l  +  \gl)  —  1 

_i= 1 

(*)• 

(B.20) 

B.3  Derivation  of  (A. 9) 

Prom  the  definition  of  the  fuzzy  measure  (see  (A.l))  and  (A. 7),  we  have  0  <  gi  <  1  and 
g(X)  =  g(An)  =  1.  From  (A.4),  we  get 


When  i  —  n,  (B.21)  becomes 


9(A)  -  gi 

(1  +  A  g*) 


(B.21) 


-  (TTvO  (B'22) 

Since  A  >  -1,  0  <  f/(A„_i)  <  1  and  consequently  0  <  p(A„_2)  <  1  [by  substituting  i  =  n- 1 
into  (B.21)].  Similarly,  0  <  g(Ai)  <  1  Vi  Equality  is  obtained  when  i  =  1,  i.e.,  g(An)  =  1 
[by  (A. 7 )]. 


B.4  Derivation  of  (A.  14) 

Expand  (A.  13)  for  k  —  1  and  k  as  follows: 
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Eg{h)  =  ■■•  +g  (yljt-i)  [h  (xk-i)  -  h  (xk))  +  g  (Ak)  [/i  (a;*:)  -  h  (aq+1)]  +  •  •  •  (B.23) 


Collect  terms  with  h(xk)  as  the  common  factor,  to  obtain 
Eg(h)  =  ■■■+  h  (xk- 1)  g  +  h  ( xk )  [ g  (Ak)  -  g  (Afc_i)]  +  h  (xk+1)  \-g  (Afc)]  •  •  •  (B.24) 

which  when  summed  leads  to  (A.  14). 

B.5  Derivations  of  (A.  15)  and  (A. 16) 

Let  Z  =  {zi, z71]  such  that  Z\  =  xn,  z2  —  xn~i,  ... ,  zn  =  aq.  Because  h(x  i)  <  h{x2 )  < 
••  •  <  h(x„),  it  follows  that  h(z\)  >  h(z2)  >  •••  >  h(zn).  This  one-to-one  correspondence 
between  X  and  Z  can  be  specified  by  the  relation  aq  =  zn-t+ 1  .  Replacing  x  by  z  in  (A.  16), 
we  find 


Eg[K)  —  ^  y9  ({^n— i+li  Zn— it  •••  >  })  [h  (^n— i+l)  h(zn—i+  2)]  (B.25) 

i=  1 

Letting  j  =  n  -  i  +  1,  and  B:i  =  {zl7  ... ,  Zj},  (B.25)  can  be  expressed  as: 

n 

Eg{h)  =  g({zj ,  Zj-u  ... ,  *i})  [/i  fo)  -  h  (zj+i)]  (B.26) 

j=  1 

n 

=  (Bj)  [h  (. Zj )  -  h  fo+1)]  (B.27) 

j= i 

which  is  in  exactly  the  form  of  (A. 13).  This  proves  that  (A.16)  is  a  form  that  is  equivalent 
to  the  CFI  in  (A. 13).  Finally  (A.16)  is  equivalent  to  (A. 15)  because  X  =  {x\,x2, . . .  ,£„} 
and  A{  =  {aq,  x2, . . . ,  aq}  and  so,  g  (X  -  {Ai-i})  =  g  (aq,  aq+i , ...,  a,*„). 
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Appendix  C 

Proofs  of  the  Properties  of  the  CFI 

C.l  Property  1 

Let  us  take  the  partial  derivative  of  CFI  in  (A.14)  with  respect  to  h(x{)  for  any  i,  i.e. 

| m=g(Al)-g{Al_t)  (c.i) 

Using  (A.4)  to  expand  g(Ai),  (C.l)  can  be  expressed  as 

=  g‘  +  A g‘g(A->)  =  g‘  (1  +  Xg  04,.,))  (C.2) 

Because  g'  >  0  ,  0  <  g  (A,)  <  1  and  A  >  —1,  the  expression  gl  (1  +  Xg  (Aj_i))  >  0,  which 
proves  that  the  CFI  is  a  monotonically  increasing  function  with  respect  to  h(x). 

C.2  Property  2 

When  gi  =  1  for  all  i  =  1,2,  then  g{At)  =  1  for  all  i  =  1,2,  ...  ,n  (from  (A.7)  and 
(A.4));  therefore,  the  CFI  becomes  (using  (A.14)) 

n 

EAh)  (xi)  1 9  (A)  -  g  (A-i)]  =  h  (xi)  (C.3) 

t=l 

since  g(A0)  =  0.  Because  h(x i)  >  ft(x2)  >  •  •  >  h(xn),  it  follows  that 

h(x i)  =  max  (h(xi),/i(x2), . . . ,  h(xn))  (C.4) 

When  gl  =  0  for  all  i,  then,  g(An)  —  1,  g{Aj)  =  0  for  all  i  /  n  (from  (A.7)  and  (A.4)).  The 
CFI  then  becomes  (using  (A.14)) 
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(C.5) 


E„(h)  =  ^k{xt)  [g(At)  -  =  h(x„) 

»= 1 

Because  h(x  1)  >  h{x 2)  >  ■  •  •  >  h(xn),  it  follows  that 

h(xn)  =  min  (h(xi),h(x2), . . . ,  h(a-„))  (C.6) 

When  0  <  g*  <  1  for  all  i,  then  g(An )  =  1,  and  0  <  g{Ai )  <  1  for  all  i.  Since  the 
maximum  value  of  h(x j)  is  obtained  only  when  all  the  density  values  are  one,  in  all  other 
cases  Eg(h)  <  xmax.  Similarly,  rrmin  <  Eg  since  the  minimum  value  is  obtained  only  when  all 
the  densities  equal  zero. 

C.3  Property  3 

Using  (A. 13),  g(An)  —  l,and  h(xn+i)  =  0,  we  have 

n 

Eo(h)  =  (^)  (Xi)  ~  h  (®i+i)]  =  9  (Ai)  h  (xn)  =  h  (xn)  =  c  (C.7) 

«=1 

since  all  the  other  terms  vanish  because  the  h  terms  cancel  as  i  ranges  from  1  to  n. 

C.4  Property  4 

In  (A. 14),  let  a;  =  g  (Ai)  —  g(Ai-i)  for  all  i.  In  (A.19)  g  is  the  same  for  both  hi  and  h2; 
hence, 


f  n 

/  hiog  =  Eg{hi )  (xi )  \g  (At)  -  g  (A^x)]  =  hi  (®i)  oi  +  hi  (x2)  a2  +  ---  +  hi  (xn)  a„ 

i= 1 

<  h2  (xx)  ax  +  h2  (x2)  a2  +  •  •  •  +  h2  ( xn )  an  —  /  h2  o  g  (C.8) 

Jx 

since  hi(xi)  <  h2(Xj)  for  all  i. 


C.5  Property  5 

Let  gl(i  =  1,  ... n )  be  the  fuzzy  densities  of  the  universal  set  X,  B  =  {aq,  ...,  X;}  c  X, 
C  =  {«x,  ... ,  xm}  and  l  <m<  n.  Then  using  (A.13),  we  see  that 
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P  m 

/  h  o  g  =  g  {A)  [h  (Xi)  -  h  (arj+i)] 

t=i 

l  m 

=  (^)  lh  (*0  -  h  (*<+i)]  +  ^2  9  (A)  [h  (Xi)  -  h  (a-'i+i)] 


i- 1 


i=l+ 1 


=  /  hog+  V  g(Ai)[h(Xi)  -h(xi+1)}  >  hog 
Jb  i=l+1  Jb 

because  g  (A,)  >  0  and  h  ( Xi )  —  h  (a:i+i)  >  0. 


(C.9) 


C.6  Property  6 

Given  that 


3(4,)  =  !  =  £$*  + /(A)  (ai°) 

i- 1 

where  /(A)  is  a  function  of  A  as  in  (A. 5),  and  YTj=i  9j  —  1>  we  see  that  A  =  0  for  this  equation 
to  be  satisfied.  In  this  case,  the  g{Ai)' s  become  additive  measures,  i.e.,  g{Ai)  =  9j  ■> 
and  the  CFI  in  (A. 13)  simplifies  to  (refer  to  (A.5)): 

Eg(h)  =  J^h  (x^  [5  (Ai)  -  g  (A£_i)]  =  J2  h  (Xi) 

i= 1  i=l 

n 

=  ^  h  (xi)  gl  (C.12) 

1=1 

which  is  a  weighted  average.  In  the  specific  case  when  the  densities  all  equal  1/n,  (C.12) 
simplifies  further  to: 

E>W  =  lY,  hM  (C.13) 

1=1 

which  is  the  arithmetic  mean. 


i- 1 


j=i  j=i 


(C.11) 


C.7  Property  7 

If  gj  =  0,  then  according  to  (A.4) 
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(C.14) 


g(A,)  =  </  +  g(A,  ,)  +  Xglg(Al-1)  =  g(Aj- ,) 


In  this  case,  (A. 14)  becomes 

n  n 

Ea{h)  =  J2h(xi)i9(A)-g(Ai_1)]=  ^2  h(xi)\g{Ai)  -  g(Ai^)} 

i=  1 

since  the  g(Aj)  terms  cancel  out  when  i  =  j. 


(C.15) 
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